Skip to content

Conversion from ImageNet tarballs to TFRecords still requires Python 2 #8653

@nobutoba

Description

@nobutoba

Prerequisites

  • I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • I checked to make sure that this issue has not already been filed.

I am using the latest TensorFlow Model Garden release but TensorFlow 1 (TensorFlow-Slim Image Classification Model Library is not compatible with tf2).

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py

2. Describe the bug

As per the BUILD file, the script build_imagenet_data.py must be compatible with Python 3:

py_binary(
    name = "build_imagenet_data",
    srcs = ["datasets/build_imagenet_data.py"],
    python_version = "PY3",
    deps = [
        "//learning/brain/public:disable_tf2",  # build_cleaner: keep; go/disable_tf2
        # "//numpy",
        "//third_party/py/six",
        # "//tensorflow",
    ],
)

However, it currently only works with Python 2.

3. Steps to reproduce

If I run build_imagenet_data.py with Python 3, it raises a TypeError:

Traceback (most recent call last):
  ...
  (snip)
  ...
  File "/apps/python/3.7.6/lib/python3.7/random.py", line 278, in shuffle
    x[i], x[j] = x[j], x[i]
TypeError: 'range' object does not support item assignment

4. Expected behavior

The script build_imagenet_data.py is expected to work properly with both Python 2 and 3.

5. Additional context

  • The error comes from the following lines in build_imagenet_data.py:

      ...
      # Shuffle the ordering of all image files in order to guarantee
      # random ordering of the images with respect to label in the
      # saved TFRecord files. Make the randomization repeatable.
      shuffled_index = range(len(filenames))
      random.seed(12345)
      random.shuffle(shuffled_index)
      ...

    Since the built-in range() produces a list in Python 2 but a range object in Python 3, the latter of which cannot be passed into random.shuffle, it raises the TypeError as above.

  • A cheap solution to this would be to replace the definition of shuffled_index above with:

    shuffled_index = [x for x in range(len(filenames))]

    or alike, so that shuffled_index becomes a list regardless of the Python version.

  • The compatibility with Python 3 was declared in the commit two weeks ago.

6. System information

  • OS Platform and Distribution: CentOS Linux 7
  • Mobile device name if the issue happens on a mobile device: N/A
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.15.3
  • Python version: 3.7.6
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 10.0.130.1/7.6.5
  • GPU model and memory: Tesla V100 with 16160MiB memory

Metadata

Metadata

Labels

models:researchmodels that come under research directorytype:bugBug in the code

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions