Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

[bugfix] - COCO dataset reads images as text files #1466

Merged
merged 1 commit into from
Mar 1, 2019

Conversation

hbrylkowski
Copy link
Contributor

After trying to run tensor2tensor on COCO dataset, I got error:

Traceback (most recent call last):
  [...]
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/bin/t2t_trainer.py", line 401, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/bin/t2t_trainer.py", line 384, in main
    generate_data()
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/bin/t2t_trainer.py", line 279, in generate_data
    registry.problem(problem_name).generate_data(data_dir, tmp_dir)
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/data_generators/image_utils.py", line 372, in generate_data
    self.dev_filepaths(data_dir, self.dev_shards, shuffled=False))
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/data_generators/generator_utils.py", line 490, in generate_dataset_and_shuffle
    generate_files(train_gen, train_paths)
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/data_generators/generator_utils.py", line 165, in generate_files
    for case in generator:
  File "/root/.local/lib/python3.5/site-packages/tensor2tensor/data_generators/mscoco.py", line 129, in mscoco_generator
    encoded_image_data = f.read()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/file_io.py", line 132, in read
    pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/file_io.py", line 100, in _prepare_value
    return compat.as_str_any(val)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 107, in as_str_any
    return as_str(value)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 80, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

So my PR only changes to reading images as bytefiles.

@googlebot googlebot added the cla: yes PR author has signed CLA label Feb 23, 2019
@afrozenator
Copy link
Contributor

Thanks @hbrylkowski !

@afrozenator afrozenator merged commit f0f0948 into tensorflow:master Mar 1, 2019
tensorflow-copybara pushed a commit that referenced this pull request Mar 1, 2019
PiperOrigin-RevId: 236376505
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
PiperOrigin-RevId: 236376505
@ZainySong
Copy link

Sorry to bother you. Can I ask a question about the t2t-decoder? How should I use the t2t-decoder when the file is image? I used the image transformer, and I trained the model by using Cifar10. But I cannot use the t2t-decoder, there is an error all the time.

/compat.py", line 80, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Can you tell me your command when you use the COCO dataset to do the decode? Or do you know how can I solve my error?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes PR author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants