Support for PyPy and Python 3.5+ #4

jamadden · 2017-05-01T18:10:19Z

Python 3.5+ because it uses % formatting on byte strings.

Coverage is OK, but not super great.

icemac

Mostly LGTM. I only have some questions resp. suggestions.

icemac · 2017-05-01T18:46:43Z

src/zope/app/locales/extract.py

 from zope.i18nmessageid import Message
 from zope.app.locales.interfaces import IPOTEntry, IPOTMaker, ITokenEater

+if not hasattr(tokenize, 'generate_tokens'):


It would be easier to understand if it does not contain a not. Plus it would be nice to keep the Python 3 version in the if branch.

I don't know why I used an if there. It should probably be a try/except ImportError.

icemac · 2017-05-01T18:47:36Z

src/zope/app/locales/extract.py

 DEFAULT_CHARSET = 'UTF-8'
 DEFAULT_ENCODING = '8bit'
 _import_chickens = {}, {}, ("*",) # dead chickens needed by __import__

-pot_header = '''\
+pot_header = u'''\
 ##############################################################################
 #
 # Copyright (c) 2003-2004 Zope Foundation and Contributors.


Can we remove or at least change these year numbers?

icemac · 2017-05-01T18:52:35Z

src/zope/app/locales/extract.py

-    >>> entry = POTEntry(Message("Oe", default="\xd6"))
-    >>> entry.write(FakeFile())
+    string with the DEFAULT_ENCODING, too (note that Message always runs
+    ``unicode`` on its value and default value; on Python 3, ``unicode`` is str, so if


Maybe we should disallow bytes on Python 3. There should be no code in the wild using bytes on Python 3. Porting code to Python 3 would end in a Python 3 str by using 'asdf' or u'asdf' as message argument. If someone actually uses b'asdf' he should change his code as MessageId always was a subclass of unicode.

But I fear this cannot be fixed here. *sigh*

Yeah, it's kinda messed up, not just on Python 3, but also on Python 2 if the C implementation of Message isn't available: you get a UnicodeDecodeError because the Python implementation does unicode(default).

The C implementation is not doing that, which is why it works. It actually looks like the C implementation isn't getting imported on Python 3, due to a bug in the import line. (Sigh.)

I think that the Python implementation of Message should probably not be doing string conversion.

zopefoundation/zope.i18nmessageid#6 Should fix both issues.

icemac · 2017-05-01T18:53:38Z

src/zope/app/locales/extract.py

@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line):

    def write(self, file):
        if self.comments:
-            file.write(self.comments)
+            file.write(self.comments.encode(DEFAULT_CHARSET)
+                       if not isinstance(self.comments, bytes)


It would be easer to understand without the not.

I find the not easier for two reasons. First, because bytes is the same on Python 2 and 3. The alternative spelling is comments if isinstance(comments, text_type) else comments.encode(...). So now you have to go follow the indirection of text_type to see what that's defined as, and keep in mind that it differs on Python 2 and 3.

Second, the emphasis of something needing to be done flows easier to me if it's the first part of the line: Do this if that otherwise default seems more clear than default if that otherwise do this.

But that's subjective.

I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET).

I've no preference for either version.

icemac · 2017-05-01T18:55:04Z

src/zope/app/locales/extract.py

        for filename, line in self.locations:
-            file.write('#: %s:%s\n' % (filename, line))
+            file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET),


What about using the ''.format() notation?

bytes objects don't support .format in Python 3 😦 % is available again as of 3.5.

python3.6 -c 'print(b"abc".format)' Traceback (most recent call last): File "<string>", line 1, in <module> AttributeError: 'bytes' object has no attribute 'format'

What did people do before 3.5? Manual joining? This module makes somewhat extensive use of % formatting so it would be annoying to try to change that.

What did people do before 3.5?

Complain a lot.

Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)

Complain some more

Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)

Petition and convince Guido to add back %-formatting to byte strings in 3.5

@jamadden Sorry, I was not aware of this problem.

icemac · 2017-05-01T19:00:38Z

src/zope/app/locales/extract.py

+    #     names[:] = filter(lambda x:x not in exclude, names)
+    #     files += [os.path.join(dirname, name)
+    #               for name in fnmatch.filter(names, pattern)
+    #               if name not in exclude]


Commented-out code?

icemac · 2017-05-01T19:03:31Z

src/zope/app/locales/extract.py

            engine.catalog['default'] = {}
        catalog.update(engine.catalog['default'])
-    for msgid, locations in catalog.items():
-        catalog[msgid] = map(lambda l: (l[0], l[1][0]), locations)
+    for msgid, locations in list(catalog.items()):


Is list actually necessary here?

I think so. The next line is catalog[msgid] = .... So if the items aren't copied we're mutating the dictionary as we're iterating over it, which raises exceptions.

icemac · 2017-05-01T19:04:30Z

src/zope/app/locales/pygettext.py

-import operator
+
+
+if not hasattr(tokenize, 'generate_tokens'):


icemac · 2017-05-01T19:05:47Z

src/zope/app/locales/pygettext.py

@@ -257,7 +265,7 @@ def __init__(self, options):
    def __call__(self, ttype, tstring, stup, etup, line):
        # dispatch
 ##        import token
-##        print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
+##        print(\, file=sys.stderr, 'ttype:', token.tok_name[ttype])
 ##              'tstring:', tstring


Remove this code?

That was an amusing take on \. An automated refactoring tool/regexp replace, I presume?

icemac · 2017-05-01T19:08:22Z

src/zope/app/locales/pygettext.py

                    isdocstring = 1
                # k is the message string, v is a dictionary-set of (filename,
                # lineno) tuples.  We want to sort the entries in v first by
                # file name and then by line number.
-                v = v.keys()
+                v = list(v.keys())
                v.sort()


v = sorted(v.keys())?

… POTEntry equality and sorting.

- Remove unused functions from extract's TokenEater. - Tests for py_strings. - Pin to correct version of zope.i18nmessageid

jamadden · 2017-05-02T12:38:36Z

The bug in zope.i18nmesageid has been addressed, so unless someone feels strongly about Python 3.4 support being part of this (held up by bytes formatting) I would say this is good to go.

mgedmin

LGTM!

mgedmin · 2017-05-02T12:40:25Z

src/zope/app/locales/extract.py

@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line):

    def write(self, file):
        if self.comments:
-            file.write(self.comments)
+            file.write(self.comments.encode(DEFAULT_CHARSET)
+                       if not isinstance(self.comments, bytes)


I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET).

I've no preference for either version.

mgedmin · 2017-05-02T12:43:05Z

src/zope/app/locales/extract.py

        for filename, line in self.locations:
-            file.write('#: %s:%s\n' % (filename, line))
+            file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET),


What did people do before 3.5?

Complain a lot.

Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)

Complain some more

Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)

Petition and convince Guido to add back %-formatting to byte strings in 3.5

mgedmin · 2017-05-02T12:47:06Z

src/zope/app/locales/extract.py

-        if default is not None:
-            default = unicode(default)
+        if default is not None and not isinstance(default, text_type):
+            default = default.decode(DEFAULT_CHARSET)


Hm. This changes the logic: previously this method would only accept Unicode or pure-ASCII bytestrings; now we're accepting arbitrary bytestrings and assuming a specific charset.

Was this intentional? Why?

Where is this helper method being called from?

It was intentional, although maybe it wasn't fully thought through?

It was mostly intended to make it symmetric with the behaviour exposed by the POTentry doctest exposed, where different things happen on Python 2 and Python 3 for text_type(b'\xd6'). In that specific example, one raises an exception, one returns nonsense results; calling str on bytes objects in Python 3 is Not What You Want To Do.

Since UTF-8 is a strict superset of ASCII, this shouldn't be narrowing the possible inputs, just expanding them, which seemed reasonable to get consistent behaviour.

This method is called from the __openseen method of the token eater state machine when it gets the closing ) character for the _('translated expression', 'default').

So since this is called from the tokenizer, this may not actually really matter. On all Python versions, the tokenizer produces native strings; on Python 3 it does this by determining the file's encoding using comments or the BOM.

A very good point: we don't want to get "b'Foo'" on Python 3 with code that would raise UnicodeErrors on Python 2.

Especially when the only thing calling this is the tokenizer so it's all moot.

Thank you for taking your time to explain!

mgedmin · 2017-05-02T12:48:29Z

src/zope/app/locales/extract.py

-                           tokenize.COMMENT):
-            # there was no class docstring
-            self.__state = self.__waiting
-


~~Dead code elimination?~~ sorry that was too cryptic

I'm guessing these methods were deleted because they were never called from anywhere?

Correct. I verified in d6b9465 that (a) coverage testing didn't hit them (even though it does hit the nearly identical implementation in pygettext.py) and then (b) that this class implements a manual state machine, and the only way those two methods could be called was if they were manually assigned to self.__state or explicitly called; they weren't; and (c) they were double-dunder private methods so nobody else should be calling them either.

mgedmin · 2017-05-02T12:49:13Z

src/zope/app/locales/extract.py

-                    print >> sys.stderr, ("Could not figure out the i18n domain"
-                                          "for module %s, assuming it is OK" % module_name)
+                    print("Could not figure out the i18n domain"
+                          " for module %s, assuming it is OK" % module_name, file=sys.stderr)


Thank you for fixing that missing space!

mgedmin · 2017-05-02T12:51:41Z

src/zope/app/locales/extract.py

+           "output file: %r\n" \
+           "Python only: %r" \
+           % (base_dir, path, site_zcml, exclude_dirs, domain,
+              include_default_domain, output_file, python_only))


You wouldn't need to indent every line if you didn't add the spurious space after a ( ;)

mgedmin · 2017-05-02T12:54:14Z

src/zope/app/locales/pygettext.py

        # Sort the entries.  First sort each particular entry's keys, then
        # sort all the entries by their first item.
        reverse = {}
        for k, v in self.__messages.items():
-            keys = v.keys()
+            keys = list(v.keys())
            keys.sort()


Use keys = sorted(v.keys()) like elsewhere?

mgedmin · 2017-05-02T12:57:02Z

src/zope/app/locales/tests.py

+        finally:
+            sys.exit = _exit
+            sys.stderr = stderr
+            sys.stdout = stdout


I wonder if using unittest.mock wouldn't make this simpler.

(Well, the code's already been written, so probably not now ;)

It might have...but not on Python 2.7 😄

D'oh! I forgot that the difference between 2.6 and 2.7 was unittest2 (extra assertions, support for skipping), but unittest.mock showed up much later.

mgedmin · 2017-05-02T12:58:48Z

src/zope/app/locales/tests.py

+                                      '-s', 'no such file'], 1)
+        self.assertIn('does not exist', err.getvalue())
+
+        temp = tempfile.mkdtemp()


I really like when every invocation of tempfile.mkdtemp() has a prefix='test-zope-app-locales-' or something similar, so that people have a chance to track down which piece of software leaves clutter behind.

Second, can you maybe use self.addCleanup(shutil.rmtree, temp) here so the directory is deleted even if the test fails?

Very good points. There are other tests that use mkdtemp here and I just did what they did, but there's no reason not to clean them up too.

mgedmin · 2017-05-02T13:01:38Z

src/zope/app/locales/tests.py

+
+    def test_extract(self):
+        me = __file__
+        if me.endswith(".pyc"):


Might as well handle .pyo with .endswith((".pyc", ".pyo")).

Not that I know anyone who runs unit tests of random software with python -O. (I remember people arguing one shouldn't use bare asserts for test in case they want to run the test suite with -O, but I don't believe they actually do it.)

mgedmin · 2017-05-02T13:05:26Z

Anticipating a forthcoming request: @jamadden, you now have PyPI rights to zope.app.locales.

mgedmin · 2017-05-02T13:08:34Z

Re: Python 3.4 -- I say ignore it until somebody files a wishlist bug explaining why they need Python 3.4 support.

- whitespace around a print function - one more switch to sorted() from ().sort() - give temp dirs distinctive names - allow for .pyo files too

jamadden · 2017-05-02T13:46:24Z

Thanks for the reviews!

Support for PyPy and Python 3.5+

0d9a7ac

jamadden force-pushed the python3 branch from 2269331 to 0d9a7ac Compare May 1, 2017 18:25

icemac self-requested a review May 1, 2017 18:43

icemac approved these changes May 1, 2017

View reviewed changes

jamadden mentioned this pull request May 1, 2017

C/Python consitency: C doesn't convert default to unicode zopefoundation/zope.i18nmessageid#5

Closed

jamadden added 3 commits May 1, 2017 15:51

Cleanups

331c145

Add dependency on git version of zope.i18nmessageid for testing. Test…

fcd15ac

… POTEntry equality and sorting.

More test coverage.

d6b9465

- Remove unused functions from extract's TokenEater. - Tests for py_strings. - Pin to correct version of zope.i18nmessageid

mgedmin approved these changes May 2, 2017

View reviewed changes

Cleanups

dbfdedf

- whitespace around a print function - one more switch to sorted() from ().sort() - give temp dirs distinctive names - allow for .pyo files too

jamadden merged commit bd04360 into master May 2, 2017

jamadden deleted the python3 branch May 2, 2017 13:46

Support for PyPy and Python 3.5+ #4

Support for PyPy and Python 3.5+ #4

Conversation

jamadden commented May 1, 2017

icemac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin May 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamadden commented May 2, 2017

mgedmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin May 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamadden May 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin May 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin May 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin commented May 2, 2017

mgedmin commented May 2, 2017

jamadden commented May 2, 2017

mgedmin May 2, 2017 •

edited

Loading

mgedmin May 2, 2017 •

edited

Loading

jamadden May 2, 2017 •

edited

Loading

mgedmin May 2, 2017 •

edited

Loading

mgedmin May 2, 2017 •

edited

Loading