Cope with unicode all in Python 2 packages #19

cjwatson · 2018-02-02T13:51:34Z

When porting a package from Python 2 to 3, one natural path involves
adding from __future__ import unicode_literals everywhere to prepare
for the bytes/unicode change. This can cause __all__ to contain
unicode elements, which mostly works but breaks star imports as
follows (depending on the exact Python version - see
https://bugs.python.org/issue21720):

TypeError: Item in ``from list'' not a string

TypeError: Item in ``from list'' must be str, not unicode

Star imports can usually be avoided, but it's hard to avoid this
behaviour of zope.configuration if you're using ZCML, so it seems worth
adjusting ConfigurationContext.resolve slightly to avoid the problem.
The sys.modules logic is borrowed from 2.7's
importlib.import_module. (Using importlib directly here is tricky
because of the care we take with tracebacks, but for the time being we
can still get by with __import__.)

When porting a package from Python 2 to 3, one natural path involves adding `from __future__ import unicode_literals` everywhere to prepare for the `bytes`/`unicode` change. This can cause `__all__` to contain `unicode` elements, which mostly works but breaks star imports as follows (depending on the exact Python version - see https://bugs.python.org/issue21720): TypeError: Item in ``from list'' not a string TypeError: Item in ``from list'' must be str, not unicode Star imports can usually be avoided, but it's hard to avoid this behaviour of zope.configuration if you're using ZCML, so it seems worth adjusting `ConfigurationContext.resolve` slightly to avoid the problem. The `sys.modules` logic is borrowed from 2.7's `importlib.import_module`. (Using `importlib` directly here is tricky because of the care we take with tracebacks, but for the time being we can still get by with `__import__`.)

jamadden · 2018-02-02T14:27:19Z

It seems to me that this scenario is considered an error in Python 2. The response of the Python core devs in 2.7.13rc1 was to make a better error message:

Improve exception message when the type of fromlist is unicode. fromlist
parameter of __import__() only accepts str in Python 2 and this will help to
identify the problem especially when the unicode_literals future import is
used.

It's not clear to me why this code should support packages that are in violation of the language spec. This issue is going to be lurking for those packages when used elsewhere.

That said, the changes to the code are fairly innocuous, so I'd only be a -0 on this personally. @tseaver may have a different opinion (I know he considers unicode_literals to be a code smell, and I've come to agree).

cjwatson · 2018-02-02T14:46:08Z

Right, I understand that this is at best a bit weird in Python 2, but it's relatively easy to work around here and IMO it's useful to do so.

unicode_literals can be smelly in library code, since it can easily introduce API changes. I wouldn't advocate using it without caution. However, in leaf application code it can be a very useful way to make progress on transitioning. In particular, the strategy I'm following in Launchpad (which is where I ran across this) is to first change the tests to use unicode_literals (watching out for the handful of cases that really need bytes instead), and make sure everything works there; once that works, it will be much easier to change the actual application code; and once we have appropriate versions of all our dependencies we can switch to py3-only. I don't think this strategy runs afoul of the usual arguments against unicode_literals. So far this is the first non-trivial problem I've run into.

tseaver · 2018-02-02T18:58:29Z

I'm not really in favor of encouraging the __unicode_literals__ hackage: it is an attractive nuisance, because it means the author / maintainer is punting on thinking about the semantics the literals in a module. This consideration gets important because there are actually three cases which matter: must be bytes, must be text, and must be native str (important for dealing with some stdlib modules -- I'm looking at you, email).

cjwatson · 2018-02-03T15:15:48Z

Or, as in our case, it means that we know that virtually all the literals in legacy code are in fact text even if they lack an explicit u prefix (because we're a database-heavy webapp, and nearly all the literals are either strings destined for a text column or parts of responses that we only want to convert to bytes at the boundary), and going through three-quarters of a million lines of code adding unicode_literals and annotating just the few exceptions to this is an awful lot less arduous than going through adding u prefixes to nearly every single string.

mgedmin

I've had my own unpleasant encounters with unicode_literals (certain pieces of the WSGI stack do not like it when other pieces put unicode keys in the WSGI dict!) and consider it a Bad Idea, but I don't mind making the life easier for people who haven't burned their fingers yet, especially if it makes our code nicer (no more dead_chickens!).

mgedmin · 2018-02-04T18:22:47Z

src/zope/configuration/config.py

@@ -148,7 +146,8 @@ def resolve(self, dottedname):
            oname = ''

        try:
-            mod = __import__(mname, *_import_chickens)
+            __import__(mname)
+            mod = sys.modules[mname]


I never liked the _import_chickens magic. I can't say I like the replacement too much either.

I wonder, could we use mod = importlib.import_module(mname) here?

importlib.import_module is I think possible (and in Python 2 it's more or less identical to the code above with some extra elaboration), and it would solve the problem at hand too.

The difficulty is the exception handling: because the actual import is now one or more levels down the stack, depending on the Python version, in order to distinguish between an ImportError caused by the immediate module not existing and something deeper in the stack we'd need to walk through any ImportError traceback manually and stop when the frames stop referring to importlib. If people would prefer that (a happy conjunction of making more idiomatic use of modern Python APIs and solving the particular edge case I ran into), then I can certainly have another go, but I'd assumed that it would be too complex a change for the task at hand.

Ah, I did not notice the depth checking bits. __import__ is fine then.

(I'd add a comment explaining why mod = __import__(mname) would not work and why the sys.modules[] lookup is necessary, but that's just me.)

mgedmin · 2018-02-04T18:26:05Z

src/zope/configuration/tests/test_config.py

+        for name in ('zope.configuration.tests.unicode_all',
+                     'zope.configuration.tests.unicode_all.__future__'):
+            if name in sys.modules:
+                del sys.modules[name]


I would prefer the cleanup to be in a tearDown method (or implemented via self.addCleanup(sys.modules.pop, 'zope....', None) maybe).

TBH I'm not sure cleanup is necessary here. We don't usually consider actual modules getting imported into sys.modules to be a side effect worth cleaning up.

This was clone-and-hack from other tests. I guess that it's relevant in those cases because multiple test cases try to import the same test modules, and that won't work so well without the cleanup?

Consistency with other tests is a good-enough reason.

mgedmin · 2018-02-05T11:08:15Z

src/zope/configuration/config.py

@@ -148,7 +146,8 @@ def resolve(self, dottedname):
            oname = ''

        try:
-            mod = __import__(mname, *_import_chickens)
+            __import__(mname)
+            mod = sys.modules[mname]


Ah, I did not notice the depth checking bits. __import__ is fine then.

(I'd add a comment explaining why mod = __import__(mname) would not work and why the sys.modules[] lookup is necessary, but that's just me.)

mgedmin · 2018-02-05T11:08:42Z

src/zope/configuration/tests/test_config.py

+        for name in ('zope.configuration.tests.unicode_all',
+                     'zope.configuration.tests.unicode_all.__future__'):
+            if name in sys.modules:
+                del sys.modules[name]


Consistency with other tests is a good-enough reason.

icemac · 2018-03-07T07:23:32Z

Is there a consensus that this PR can be merged as is?

mgedmin · 2018-03-07T11:08:36Z

There's an LGTM from me, and it looks like nobody else wanted to review, so I'd say yes?

(I've a habit of not merging pull requests for ZopeFoundation repositories, due to the committer agreement requirement. People who have signed it can merge themselves, and if people haven't signed it, then I'm not supposed to merge their code anyway.)

icemac · 2018-04-19T06:06:00Z

@cjwatson Did you sign a Zope contributor agreement, to be able to hit the big green merge button?

cjwatson · 2018-05-03T13:20:48Z

Ah, sorry, I think I missed that something resembling consensus had emerged. Yes, I've signed the contributor agreement. I've added a comment per review and will merge.

mgedmin reviewed Feb 4, 2018

View reviewed changes

mgedmin approved these changes Feb 5, 2018

View reviewed changes

Add rationale comment

0546515

cjwatson merged commit 4a4c1a0 into zopefoundation:master May 3, 2018

cjwatson deleted the unicode-all branch May 3, 2018 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cope with unicode all in Python 2 packages #19

Cope with unicode all in Python 2 packages #19

cjwatson commented Feb 2, 2018

jamadden commented Feb 2, 2018

cjwatson commented Feb 2, 2018

tseaver commented Feb 2, 2018

cjwatson commented Feb 3, 2018

mgedmin left a comment •

edited

Loading

mgedmin Feb 4, 2018

cjwatson Feb 4, 2018

mgedmin Feb 5, 2018

mgedmin Feb 4, 2018

cjwatson Feb 4, 2018

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

icemac commented Mar 7, 2018

mgedmin commented Mar 7, 2018

icemac commented Apr 19, 2018

cjwatson commented May 3, 2018

Cope with unicode __all__ in Python 2 packages #19

Cope with unicode __all__ in Python 2 packages #19

Conversation

cjwatson commented Feb 2, 2018

jamadden commented Feb 2, 2018

cjwatson commented Feb 2, 2018

tseaver commented Feb 2, 2018

cjwatson commented Feb 3, 2018

mgedmin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icemac commented Mar 7, 2018

mgedmin commented Mar 7, 2018

icemac commented Apr 19, 2018

cjwatson commented May 3, 2018

Cope with unicode all in Python 2 packages #19

Cope with unicode all in Python 2 packages #19

mgedmin left a comment •

edited

Loading