-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Simplify native_to_unicode() & unicode_to_native() #243
Conversation
The first [uses feature detection, instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) and the second [avoids assigning a lambda expression to a variable](https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_a_lambda_to_a_variable.html).
|
@cclauss: Have you benchmarked this edit? In the original code the condition (if six.PY2) is evaluated just once, but in this PR the condition/exception is evaluated in each method call and there may be many such calls. |
|
@martinpopel Your thought based on these benchmarks on Python 2 and Python 3? #!/usr/bin/env python2 # change between python2, python 3, pypy, pypy3
# -*- coding: utf-8 -*-
from __future__ import print_function
import six
import sys
import timeit
def native_to_unicode_py2(s):
"""Python 2: transform native string to Unicode."""
return s if isinstance(s, unicode) else s.decode("utf8")
# Conversion between Unicode and UTF-8, if required (on Python2)
if six.PY2:
native_to_unicode = native_to_unicode_py2
unicode_to_native = lambda s: s.encode("utf-8")
else:
# No conversion required on Python3
native_to_unicode = lambda s: s
unicode_to_native = lambda s: s
def new_native_to_unicode(s):
"""Transform native string to Unicode."""
try: # Python 2
return s if isinstance(s, unicode) else s.decode("utf8")
except NameError: # Python 3: unicode() was dropped
return s
def new_unicode_to_native(s):
"""Transform Unicode to native string."""
return s.encode("utf-8") if six.PY2 else s
print('Call each a million times on Python', sys.version)
print('Function With str With unicode')
print('old native_to_unicode:',
timeit.timeit(
"native_to_unicode('string')",
setup="from __main__ import native_to_unicode"),
timeit.timeit(
"native_to_unicode(u'Unicöde')",
setup="from __main__ import native_to_unicode"))
print('new native_to_unicode:',
timeit.timeit(
"new_native_to_unicode('string')",
setup="from __main__ import new_native_to_unicode"),
timeit.timeit(
"new_native_to_unicode(u'Unicöde')",
setup="from __main__ import new_native_to_unicode"))
print('old unicode_to_native:',
timeit.timeit(
"unicode_to_native('string')",
setup="from __main__ import unicode_to_native"),
timeit.timeit(
"unicode_to_native(u'Unicöde')",
setup="from __main__ import unicode_to_native"))
print('new unicode_to_native:',
timeit.timeit(
"new_unicode_to_native('string')",
setup="from __main__ import new_unicode_to_native"),
timeit.timeit(
"new_unicode_to_native(u'Unicöde')",
setup="from __main__ import new_unicode_to_native")) |
|
As the author of the original PR that is being modified here, I'm a bit sceptical about this change, as it degrades performance significantly under Python3. However, to avoid the lambda assignment to a variable, a simple change like the following would be easy to do: |
lukaszkaiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's get it in, but after Google code-style changes it won't look much better than before.
The first uses feature detection, instead of version detection and the second avoids assigning a lambda expression to a variable.