Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cope with unicode __all__ in Python 2 packages #19

Merged
merged 2 commits into from
May 3, 2018

Conversation

cjwatson
Copy link
Contributor

@cjwatson cjwatson commented Feb 2, 2018

When porting a package from Python 2 to 3, one natural path involves
adding from __future__ import unicode_literals everywhere to prepare
for the bytes/unicode change. This can cause __all__ to contain
unicode elements, which mostly works but breaks star imports as
follows (depending on the exact Python version - see
https://bugs.python.org/issue21720):

TypeError: Item in ``from list'' not a string

TypeError: Item in ``from list'' must be str, not unicode

Star imports can usually be avoided, but it's hard to avoid this
behaviour of zope.configuration if you're using ZCML, so it seems worth
adjusting ConfigurationContext.resolve slightly to avoid the problem.
The sys.modules logic is borrowed from 2.7's
importlib.import_module. (Using importlib directly here is tricky
because of the care we take with tracebacks, but for the time being we
can still get by with __import__.)

When porting a package from Python 2 to 3, one natural path involves
adding `from __future__ import unicode_literals` everywhere to prepare
for the `bytes`/`unicode` change.  This can cause `__all__` to contain
`unicode` elements, which mostly works but breaks star imports as
follows (depending on the exact Python version - see
https://bugs.python.org/issue21720):

    TypeError: Item in ``from list'' not a string

    TypeError: Item in ``from list'' must be str, not unicode

Star imports can usually be avoided, but it's hard to avoid this
behaviour of zope.configuration if you're using ZCML, so it seems worth
adjusting `ConfigurationContext.resolve` slightly to avoid the problem.
The `sys.modules` logic is borrowed from 2.7's
`importlib.import_module`.  (Using `importlib` directly here is tricky
because of the care we take with tracebacks, but for the time being we
can still get by with `__import__`.)
@jamadden
Copy link
Member

jamadden commented Feb 2, 2018

It seems to me that this scenario is considered an error in Python 2. The response of the Python core devs in 2.7.13rc1 was to make a better error message:

Improve exception message when the type of fromlist is unicode. fromlist
parameter of __import__() only accepts str in Python 2 and this will help to
identify the problem especially when the unicode_literals future import is
used.

It's not clear to me why this code should support packages that are in violation of the language spec. This issue is going to be lurking for those packages when used elsewhere.

That said, the changes to the code are fairly innocuous, so I'd only be a -0 on this personally. @tseaver may have a different opinion (I know he considers unicode_literals to be a code smell, and I've come to agree).

@cjwatson
Copy link
Contributor Author

cjwatson commented Feb 2, 2018

Right, I understand that this is at best a bit weird in Python 2, but it's relatively easy to work around here and IMO it's useful to do so.

unicode_literals can be smelly in library code, since it can easily introduce API changes. I wouldn't advocate using it without caution. However, in leaf application code it can be a very useful way to make progress on transitioning. In particular, the strategy I'm following in Launchpad (which is where I ran across this) is to first change the tests to use unicode_literals (watching out for the handful of cases that really need bytes instead), and make sure everything works there; once that works, it will be much easier to change the actual application code; and once we have appropriate versions of all our dependencies we can switch to py3-only. I don't think this strategy runs afoul of the usual arguments against unicode_literals. So far this is the first non-trivial problem I've run into.

@tseaver
Copy link
Member

tseaver commented Feb 2, 2018

I'm not really in favor of encouraging the __unicode_literals__ hackage: it is an attractive nuisance, because it means the author / maintainer is punting on thinking about the semantics the literals in a module. This consideration gets important because there are actually three cases which matter: must be bytes, must be text, and must be native str (important for dealing with some stdlib modules -- I'm looking at you, email).

@cjwatson
Copy link
Contributor Author

cjwatson commented Feb 3, 2018

Or, as in our case, it means that we know that virtually all the literals in legacy code are in fact text even if they lack an explicit u prefix (because we're a database-heavy webapp, and nearly all the literals are either strings destined for a text column or parts of responses that we only want to convert to bytes at the boundary), and going through three-quarters of a million lines of code adding unicode_literals and annotating just the few exceptions to this is an awful lot less arduous than going through adding u prefixes to nearly every single string.

Copy link
Member

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had my own unpleasant encounters with unicode_literals (certain pieces of the WSGI stack do not like it when other pieces put unicode keys in the WSGI dict!) and consider it a Bad Idea, but I don't mind making the life easier for people who haven't burned their fingers yet, especially if it makes our code nicer (no more dead_chickens!).

@@ -148,7 +146,8 @@ def resolve(self, dottedname):
oname = ''

try:
mod = __import__(mname, *_import_chickens)
__import__(mname)
mod = sys.modules[mname]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never liked the _import_chickens magic. I can't say I like the replacement too much either.

I wonder, could we use mod = importlib.import_module(mname) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib.import_module is I think possible (and in Python 2 it's more or less identical to the code above with some extra elaboration), and it would solve the problem at hand too.

The difficulty is the exception handling: because the actual import is now one or more levels down the stack, depending on the Python version, in order to distinguish between an ImportError caused by the immediate module not existing and something deeper in the stack we'd need to walk through any ImportError traceback manually and stop when the frames stop referring to importlib. If people would prefer that (a happy conjunction of making more idiomatic use of modern Python APIs and solving the particular edge case I ran into), then I can certainly have another go, but I'd assumed that it would be too complex a change for the task at hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not notice the depth checking bits. __import__ is fine then.

(I'd add a comment explaining why mod = __import__(mname) would not work and why the sys.modules[] lookup is necessary, but that's just me.)

for name in ('zope.configuration.tests.unicode_all',
'zope.configuration.tests.unicode_all.__future__'):
if name in sys.modules:
del sys.modules[name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the cleanup to be in a tearDown method (or implemented via self.addCleanup(sys.modules.pop, 'zope....', None) maybe).

TBH I'm not sure cleanup is necessary here. We don't usually consider actual modules getting imported into sys.modules to be a side effect worth cleaning up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was clone-and-hack from other tests. I guess that it's relevant in those cases because multiple test cases try to import the same test modules, and that won't work so well without the cleanup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with other tests is a good-enough reason.

@@ -148,7 +146,8 @@ def resolve(self, dottedname):
oname = ''

try:
mod = __import__(mname, *_import_chickens)
__import__(mname)
mod = sys.modules[mname]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not notice the depth checking bits. __import__ is fine then.

(I'd add a comment explaining why mod = __import__(mname) would not work and why the sys.modules[] lookup is necessary, but that's just me.)

for name in ('zope.configuration.tests.unicode_all',
'zope.configuration.tests.unicode_all.__future__'):
if name in sys.modules:
del sys.modules[name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with other tests is a good-enough reason.

@icemac
Copy link
Member

icemac commented Mar 7, 2018

Is there a consensus that this PR can be merged as is?

@mgedmin
Copy link
Member

mgedmin commented Mar 7, 2018

There's an LGTM from me, and it looks like nobody else wanted to review, so I'd say yes?

(I've a habit of not merging pull requests for ZopeFoundation repositories, due to the committer agreement requirement. People who have signed it can merge themselves, and if people haven't signed it, then I'm not supposed to merge their code anyway.)

@icemac
Copy link
Member

icemac commented Apr 19, 2018

@cjwatson Did you sign a Zope contributor agreement, to be able to hit the big green merge button?

@cjwatson
Copy link
Contributor Author

cjwatson commented May 3, 2018

Ah, sorry, I think I missed that something resembling consensus had emerged. Yes, I've signed the contributor agreement. I've added a comment per review and will merge.

@cjwatson cjwatson merged commit 4a4c1a0 into zopefoundation:master May 3, 2018
@cjwatson cjwatson deleted the unicode-all branch May 3, 2018 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants