Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PyPy and Python 3.5+ #4

Merged
merged 5 commits into from
May 2, 2017
Merged

Support for PyPy and Python 3.5+ #4

merged 5 commits into from
May 2, 2017

Conversation

jamadden
Copy link
Member

@jamadden jamadden commented May 1, 2017

Python 3.5+ because it uses % formatting on byte strings.

Coverage is OK, but not super great.

Copy link
Member

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. I only have some questions resp. suggestions.

from zope.i18nmessageid import Message
from zope.app.locales.interfaces import IPOTEntry, IPOTMaker, ITokenEater

if not hasattr(tokenize, 'generate_tokens'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to understand if it does not contain a not. Plus it would be nice to keep the Python 3 version in the if branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why I used an if there. It should probably be a try/except ImportError.

DEFAULT_CHARSET = 'UTF-8'
DEFAULT_ENCODING = '8bit'
_import_chickens = {}, {}, ("*",) # dead chickens needed by __import__

pot_header = '''\
pot_header = u'''\
##############################################################################
#
# Copyright (c) 2003-2004 Zope Foundation and Contributors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove or at least change these year numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

>>> entry = POTEntry(Message("Oe", default="\xd6"))
>>> entry.write(FakeFile())
string with the DEFAULT_ENCODING, too (note that Message always runs
``unicode`` on its value and default value; on Python 3, ``unicode`` is str, so if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should disallow bytes on Python 3. There should be no code in the wild using bytes on Python 3. Porting code to Python 3 would end in a Python 3 str by using 'asdf' or u'asdf' as message argument. If someone actually uses b'asdf' he should change his code as MessageId always was a subclass of unicode.

But I fear this cannot be fixed here. *sigh*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's kinda messed up, not just on Python 3, but also on Python 2 if the C implementation of Message isn't available: you get a UnicodeDecodeError because the Python implementation does unicode(default).

The C implementation is not doing that, which is why it works. It actually looks like the C implementation isn't getting imported on Python 3, due to a bug in the import line. (Sigh.)

I think that the Python implementation of Message should probably not be doing string conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zopefoundation/zope.i18nmessageid#6 Should fix both issues.

@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line):

def write(self, file):
if self.comments:
file.write(self.comments)
file.write(self.comments.encode(DEFAULT_CHARSET)
if not isinstance(self.comments, bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easer to understand without the not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the not easier for two reasons. First, because bytes is the same on Python 2 and 3. The alternative spelling is comments if isinstance(comments, text_type) else comments.encode(...). So now you have to go follow the indirection of text_type to see what that's defined as, and keep in mind that it differs on Python 2 and 3.

Second, the emphasis of something needing to be done flows easier to me if it's the first part of the line: Do this if that otherwise default seems more clear than default if that otherwise do this.

But that's subjective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET).

I've no preference for either version.

for filename, line in self.locations:
file.write('#: %s:%s\n' % (filename, line))
file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using the ''.format() notation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes objects don't support .format in Python 3 😦 % is available again as of 3.5.

python3.6 -c 'print(b"abc".format)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'format'

What did people do before 3.5? Manual joining? This module makes somewhat extensive use of % formatting so it would be annoying to try to change that.

Copy link
Member

@mgedmin mgedmin May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did people do before 3.5?

  • Complain a lot.
  • Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)
  • Complain some more
  • Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)
  • Petition and convince Guido to add back %-formatting to byte strings in 3.5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamadden Sorry, I was not aware of this problem.

# names[:] = filter(lambda x:x not in exclude, names)
# files += [os.path.join(dirname, name)
# for name in fnmatch.filter(names, pattern)
# if name not in exclude]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code?

engine.catalog['default'] = {}
catalog.update(engine.catalog['default'])
for msgid, locations in catalog.items():
catalog[msgid] = map(lambda l: (l[0], l[1][0]), locations)
for msgid, locations in list(catalog.items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is list actually necessary here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. The next line is catalog[msgid] = .... So if the items aren't copied we're mutating the dictionary as we're iterating over it, which raises exceptions.

import operator


if not hasattr(tokenize, 'generate_tokens'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -257,7 +265,7 @@ def __init__(self, options):
def __call__(self, ttype, tstring, stup, etup, line):
# dispatch
## import token
## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
## print(\, file=sys.stderr, 'ttype:', token.tok_name[ttype])
## 'tstring:', tstring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an amusing take on \. An automated refactoring tool/regexp replace, I presume?

isdocstring = 1
# k is the message string, v is a dictionary-set of (filename,
# lineno) tuples. We want to sort the entries in v first by
# file name and then by line number.
v = v.keys()
v = list(v.keys())
v.sort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v = sorted(v.keys())?

- Remove unused functions from extract's TokenEater.
- Tests for py_strings.
- Pin to correct version of zope.i18nmessageid
@jamadden
Copy link
Member Author

jamadden commented May 2, 2017

The bug in zope.i18nmesageid has been addressed, so unless someone feels strongly about Python 3.4 support being part of this (held up by bytes formatting) I would say this is good to go.

Copy link
Member

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line):

def write(self, file):
if self.comments:
file.write(self.comments)
file.write(self.comments.encode(DEFAULT_CHARSET)
if not isinstance(self.comments, bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET).

I've no preference for either version.

for filename, line in self.locations:
file.write('#: %s:%s\n' % (filename, line))
file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET),
Copy link
Member

@mgedmin mgedmin May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did people do before 3.5?

  • Complain a lot.
  • Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)
  • Complain some more
  • Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)
  • Petition and convince Guido to add back %-formatting to byte strings in 3.5

if default is not None:
default = unicode(default)
if default is not None and not isinstance(default, text_type):
default = default.decode(DEFAULT_CHARSET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. This changes the logic: previously this method would only accept Unicode or pure-ASCII bytestrings; now we're accepting arbitrary bytestrings and assuming a specific charset.

Was this intentional? Why?

Where is this helper method being called from?

Copy link
Member Author

@jamadden jamadden May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional, although maybe it wasn't fully thought through?

It was mostly intended to make it symmetric with the behaviour exposed by the POTentry doctest exposed, where different things happen on Python 2 and Python 3 for text_type(b'\xd6'). In that specific example, one raises an exception, one returns nonsense results; calling str on bytes objects in Python 3 is Not What You Want To Do.

Since UTF-8 is a strict superset of ASCII, this shouldn't be narrowing the possible inputs, just expanding them, which seemed reasonable to get consistent behaviour.

This method is called from the __openseen method of the token eater state machine when it gets the closing ) character for the _('translated expression', 'default').

So since this is called from the tokenizer, this may not actually really matter. On all Python versions, the tokenizer produces native strings; on Python 3 it does this by determining the file's encoding using comments or the BOM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good point: we don't want to get "b'Foo'" on Python 3 with code that would raise UnicodeErrors on Python 2.

Especially when the only thing calling this is the tokenizer so it's all moot.

Thank you for taking your time to explain!

tokenize.COMMENT):
# there was no class docstring
self.__state = self.__waiting

Copy link
Member

@mgedmin mgedmin May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code elimination? sorry that was too cryptic

I'm guessing these methods were deleted because they were never called from anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I verified in d6b9465 that (a) coverage testing didn't hit them (even though it does hit the nearly identical implementation in pygettext.py) and then (b) that this class implements a manual state machine, and the only way those two methods could be called was if they were manually assigned to self.__state or explicitly called; they weren't; and (c) they were double-dunder private methods so nobody else should be calling them either.

print >> sys.stderr, ("Could not figure out the i18n domain"
"for module %s, assuming it is OK" % module_name)
print("Could not figure out the i18n domain"
" for module %s, assuming it is OK" % module_name, file=sys.stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing that missing space!

"output file: %r\n" \
"Python only: %r" \
% (base_dir, path, site_zcml, exclude_dirs, domain,
include_default_domain, output_file, python_only))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't need to indent every line if you didn't add the spurious space after a ( ;)

# Sort the entries. First sort each particular entry's keys, then
# sort all the entries by their first item.
reverse = {}
for k, v in self.__messages.items():
keys = v.keys()
keys = list(v.keys())
keys.sort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keys = sorted(v.keys()) like elsewhere?

finally:
sys.exit = _exit
sys.stderr = stderr
sys.stdout = stdout
Copy link
Member

@mgedmin mgedmin May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using unittest.mock wouldn't make this simpler.

(Well, the code's already been written, so probably not now ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have...but not on Python 2.7 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! I forgot that the difference between 2.6 and 2.7 was unittest2 (extra assertions, support for skipping), but unittest.mock showed up much later.

'-s', 'no such file'], 1)
self.assertIn('does not exist', err.getvalue())

temp = tempfile.mkdtemp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like when every invocation of tempfile.mkdtemp() has a prefix='test-zope-app-locales-' or something similar, so that people have a chance to track down which piece of software leaves clutter behind.

Second, can you maybe use self.addCleanup(shutil.rmtree, temp) here so the directory is deleted even if the test fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good points. There are other tests that use mkdtemp here and I just did what they did, but there's no reason not to clean them up too.


def test_extract(self):
me = __file__
if me.endswith(".pyc"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well handle .pyo with .endswith((".pyc", ".pyo")).

Not that I know anyone who runs unit tests of random software with python -O. (I remember people arguing one shouldn't use bare asserts for test in case they want to run the test suite with -O, but I don't believe they actually do it.)

@mgedmin
Copy link
Member

mgedmin commented May 2, 2017

Anticipating a forthcoming request: @jamadden, you now have PyPI rights to zope.app.locales.

@mgedmin
Copy link
Member

mgedmin commented May 2, 2017

Re: Python 3.4 -- I say ignore it until somebody files a wishlist bug explaining why they need Python 3.4 support.

- whitespace around a print function
- one more switch to sorted() from ().sort()
- give temp dirs distinctive names
- allow for .pyo files too
@jamadden jamadden merged commit bd04360 into master May 2, 2017
@jamadden jamadden deleted the python3 branch May 2, 2017 13:46
@jamadden
Copy link
Member Author

jamadden commented May 2, 2017

Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants