Skip to content

Dataset padded_batch does not work as documented #35900

@dhpollack

Description

@dhpollack

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): A little
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab (ubuntu-based)
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.1.0-rc1
  • Python version: 3.7

Describe the current behavior

calling Dataset.padded_batch([batch_size], [output_shape], padding_values=1) fails with the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-6fccac1ccecf> in <module>()
     20 ds_train = ds_train.padded_batch(BATCH_SIZE, padded_shapes)
     21 
---> 22 ds_test = ds_test.padded_batch(BATCH_SIZE, padded_shapes, padding_values=padded_values)

3 frames
/tensorflow-2.1.0/python3.6/tensorflow_core/python/data/util/nest.py in assert_shallow_structure(shallow_tree, input_tree, check_types)
    297       raise TypeError(
    298           "If shallow structure is a sequence, input must also be a sequence. "
--> 299           "Input has type: %s." % type(input_tree))
    300 
    301     if check_types and not isinstance(input_tree, type(shallow_tree)):

TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: <class 'int'>.

Note that this does not fail if one uses the default value of None

Describe the expected behavior

Should pad the data with the value in padding_values.

Also, the error message could be friendly by telling me what type it expects.

Code to reproduce the issue

import tensorflow as tf
import tensorflow_datasets as tfds

BATCH_SIZE = 64

ds, ds_info = tfds.load("imdb_reviews/subwords8k", with_info=True, as_supervised=True)
ds_train, ds_test = ds["train"], ds["test"]

output_shapes_train = tf.compat.v1.data.get_output_shapes(ds_train)
padded_shapes = output_shapes_train  # (TensorShape([None]), TensorShape([]))
padded_values = -1

ds_train = ds_train.padded_batch(BATCH_SIZE, padded_shapes)  # does not fail here
ds_test = ds_test.padded_batch(BATCH_SIZE, padded_shapes, padding_values=padded_values)  # but does fail here

Other info / logs
https://www.tensorflow.org/api_docs/python/tf/data/Dataset?version=stable#padded_batch

Documentation seems pretty clear that the second part should work.

Metadata

Metadata

Assignees

Labels

TF 2.1for tracking issues in 2.1 releasecomp:datatf.data related issuesstat:awaiting tensorflowerStatus - Awaiting response from tensorflowertype:bugBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions