Simplify native_to_unicode() & unicode_to_native() #243

cclauss · 2017-08-23T01:54:52Z

The first uses feature detection, instead of version detection and the second avoids assigning a lambda expression to a variable.

The first [uses feature detection, instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) and the second [avoids assigning a lambda expression to a variable](https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_a_lambda_to_a_variable.html).

martinpopel · 2017-08-23T07:55:09Z

@cclauss: Have you benchmarked this edit? In the original code the condition (if six.PY2) is evaluated just once, but in this PR the condition/exception is evaluated in each method call and there may be many such calls.

cclauss · 2017-08-23T09:39:30Z

@martinpopel Your thought based on these benchmarks on Python 2 and Python 3?

Call each a million times on Python 2.7.13 (default, Jul 18 2017, 09:17:00) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]
Function               With str      With unicode
old native_to_unicode: 1.90199708939 0.395492076874
new native_to_unicode: 1.53925609589 0.414881944656
old unicode_to_native: 0.75088596344 0.701120853424
new unicode_to_native: 0.85029101371 0.793823003769

Call each a million times on Python 3.6.2 (default, Jul 17 2017, 16:44:45) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]
Function               With str           With unicode
old native_to_unicode: 0.1422363230085466 0.11030975799076259
new native_to_unicode: 0.7890558139770292 0.7672536290192511
old unicode_to_native: 0.1131693619827274 0.10763867301284336
new unicode_to_native: 0.1800369980046525 0.17342810999252833

#!/usr/bin/env python2  # change between python2, python 3, pypy, pypy3
# -*- coding: utf-8 -*-

from __future__ import print_function
import six
import sys
import timeit


def native_to_unicode_py2(s):
    """Python 2: transform native string to Unicode."""
    return s if isinstance(s, unicode) else s.decode("utf8")


# Conversion between Unicode and UTF-8, if required (on Python2)
if six.PY2:
    native_to_unicode = native_to_unicode_py2
    unicode_to_native = lambda s: s.encode("utf-8")
else:
    # No conversion required on Python3
    native_to_unicode = lambda s: s
    unicode_to_native = lambda s: s


def new_native_to_unicode(s):
    """Transform native string to Unicode."""
    try:  # Python 2
        return s if isinstance(s, unicode) else s.decode("utf8")
    except NameError:  # Python 3: unicode() was dropped
        return s


def new_unicode_to_native(s):
    """Transform Unicode to native string."""
    return s.encode("utf-8") if six.PY2 else s


print('Call each a million times on Python', sys.version)
print('Function               With str      With unicode')
print('old native_to_unicode:',
      timeit.timeit(
          "native_to_unicode('string')",
          setup="from __main__ import native_to_unicode"),
      timeit.timeit(
          "native_to_unicode(u'Unicöde')",
          setup="from __main__ import native_to_unicode"))
print('new native_to_unicode:',
      timeit.timeit(
          "new_native_to_unicode('string')",
          setup="from __main__ import new_native_to_unicode"),
      timeit.timeit(
          "new_native_to_unicode(u'Unicöde')",
          setup="from __main__ import new_native_to_unicode"))
print('old unicode_to_native:',
      timeit.timeit(
          "unicode_to_native('string')",
          setup="from __main__ import unicode_to_native"),
      timeit.timeit(
          "unicode_to_native(u'Unicöde')",
          setup="from __main__ import unicode_to_native"))
print('new unicode_to_native:',
      timeit.timeit(
          "new_unicode_to_native('string')",
          setup="from __main__ import new_unicode_to_native"),
      timeit.timeit(
          "new_unicode_to_native(u'Unicöde')",
          setup="from __main__ import new_unicode_to_native"))

vthorsteinsson · 2017-08-23T10:55:46Z

As the author of the original PR that is being modified here, I'm a bit sceptical about this change, as it degrades performance significantly under Python3. However, to avoid the lambda assignment to a variable, a simple change like the following would be easy to do:

if six.PY2:
  def native_to_unicode(s): return s if isinstance(s, unicode) else s.decode("utf8")
  def unicode_to_native(s): return s.encode("utf-8")
else:
  # No conversion required on Python >= 3
  def native_to_unicode(s): return s
  def unicode_to_native(s): return s

lukaszkaiser

Ok, let's get it in, but after Google code-style changes it won't look much better than before.

cclauss closed this Aug 23, 2017

cclauss reopened this Aug 23, 2017

A much cleaner approach

33e798a

lukaszkaiser approved these changes Aug 25, 2017

View reviewed changes

lukaszkaiser merged commit 860fe0a into tensorflow:master Aug 25, 2017

cclauss deleted the patch-2 branch August 25, 2017 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify native_to_unicode() & unicode_to_native() #243

Simplify native_to_unicode() & unicode_to_native() #243

Uh oh!

cclauss commented Aug 23, 2017

Uh oh!

martinpopel commented Aug 23, 2017

Uh oh!

cclauss commented Aug 23, 2017 •

edited

Loading

Uh oh!

vthorsteinsson commented Aug 23, 2017

Uh oh!

lukaszkaiser left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Simplify native_to_unicode() & unicode_to_native() #243

Simplify native_to_unicode() & unicode_to_native() #243

Uh oh!

Conversation

cclauss commented Aug 23, 2017

Uh oh!

martinpopel commented Aug 23, 2017

Uh oh!

cclauss commented Aug 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vthorsteinsson commented Aug 23, 2017

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cclauss commented Aug 23, 2017 •

edited

Loading