contrib.data.Dataset - doc issue with Dataset.map / tf.py_func in 1.3.0rc0 #11786

cheind · 2017-07-26T14:14:49Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): 1.3.0rc0
Python version: 3.6
CUDA/cuDNN version: 8/6
GPU model and memory: GTX 1080
Exact command to reproduce:

The following sample is taken from here and works in TF 1.2.1

import tensorflow as tf
import numpy as np

def _read_py_function(filename, label):
  return np.zeros((100,100,1)), label

def _resize_function(image_decoded, label):
  image_decoded.set_shape([None, None, None])
  image_resized = tf.image.resize_images(image_decoded, [28, 28])
  return image_resized, label

filenames = np.array(["/var/data/image1.jpg", "/var/data/image2.jpg"])
labels = np.array([0, 37])

dataset = tf.contrib.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(
    lambda filename, label: tf.py_func(
        _read_py_function, [filename, label], [tf.uint8, label.dtype]))
dataset = dataset.map(_resize_function)

In 1.3.0rc0 the following error is produced

Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'uint8'> (Tensor is: <tf.Tensor 'PyFunc:1' shape=<unknown> dtype=int32>)

This is due to the breaking change mentioned in release notes. To fix, one now has to introduce an explicit tuple() like so

dataset = dataset.map(
    lambda filename, label: tuple(tf.py_func(
        _read_py_function, [filename, label], [tf.uint8, label.dtype])))

This should at least be mentioned in the API docs / programmer guide.

The text was updated successfully, but these errors were encountered:

aselle · 2017-07-26T17:37:55Z

@mrry, could you update the documentation to reflect this change?

aselle · 2017-07-26T17:38:14Z

Thanks @cheind for reporting this.

cheind · 2017-07-27T04:30:01Z

@mrry You're welcome. While this is certainly a doc issue at this point, I want to raise the concern that assigning tuple and list totally different semantics in Python is very uncommon (even in Tensorflow) and could lead to many suprising moments on user side.

Fixes tensorflow#11786 PiperOrigin-RevId: 163359134

END_PUBLIC I dropped the following commit because it doesn't compile. I will follow up with Andrew to fix it or revert it. Commit 003deb8 authored by osdamv<osdamv@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Refactor and implementation of the camera API 1, it fixes tensorflow#8736 (tensorflow#10771) List of commits in this CL: --- Commit 4464503 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use identity of param variable in cudnn_rnn.RNNParamsSaveable instead of parameter variable directly. The RNNParamsSaveable is usually used in a graph which also has a saver for the cudnn param variable itself, if the same op is used for both, fails with a two savers for same op error. PiperOrigin-RevId: 163431826 --- Commit d629a83 authored by RJ Ryan<rjryan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Increase bound on tf.contrib.signal.inverse_stft gradient error to avoid flakiness on macOS. PiperOrigin-RevId: 163426631 --- Commit 253bcbb authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Use HloEvaluator for convolution in reference_util. Also Speed up HloEvaluator's HandleConvolution in non-opt build, by moving calls to HloInstruction::shape() out of the inner loop. PiperOrigin-RevId: 163416183 --- Commit 569a00e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update API to traffic in unique_ptrs rather than owning raw pointers PiperOrigin-RevId: 163414320 --- Commit 31a77bc authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Java: Update release to 1.3.0-rc1 PiperOrigin-RevId: 163413736 --- Commit 1ebbf43 authored by Jonathan Hseu<vomjom@vomjom.net> Committed by GitHub<noreply@github.com>: Add missing grpc dependency (tensorflow#11828) --- Commit 905abb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Test asserts should have `expected` first. PiperOrigin-RevId: 163409348 --- Commit d5cc143 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Increase timeout to deflake the test. PiperOrigin-RevId: 163407824 --- Commit ce1c7f0 authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Properly include logging header in xla_internal_test_main PiperOrigin-RevId: 163405986 --- Commit 22241cd authored by joetoth<joetoth@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: External leveldb link changed (tensorflow#11833) table_format.txt was renamed to table_format.md --- Commit 6b7314d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Consolidating the code to fill the partition's function library into one place. Previously, Partition() and MasterSession::RegisterPartition() both fills in the partitioned graph's function library. PiperOrigin-RevId: 163400992 --- Commit 28373cf authored by Frank Chen<frankchn@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adds preliminary support for Cloud TPUs with Cluster Resolvers. This aims to allow users to have a better experienec when specifying one or multiple Cloud TPUs for their training jobs by allowing users to use names rather than IP addresses. PiperOrigin-RevId: 163393443 --- Commit e5353c9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Don't prune nodes that have reference inputs. PiperOrigin-RevId: 163390862 --- Commit 2265108 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: C API: Groundwork for experimenting with TF_Tensor in device memory. TF_Tensor objects are always backed by host memory. This commit lays the groundwork for allowing TF_Tensor objects to refer to tensor data on device (e.g., GPU) memory. PiperOrigin-RevId: 163388079 --- Commit 613bf1c authored by Yuefeng Zhou<yuefengz@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: fix asan test failure in SingleMachineTest::ReleaseMemoryAfterDestruction. PiperOrigin-RevId: 163386941 --- Commit 4653d37 authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Change type to appease GPU builds. PiperOrigin-RevId: 163384927 --- Commit 9f131bd authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 163378484 --- Commit 8bc0236 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: PiperOrigin-RevId: 163366493 --- Commit 3b97f1f authored by Yangzihao Wang<yangzihao@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Change to only run one round of matmul benchmark. PiperOrigin-RevId: 163364341 --- Commit a4a3a33 authored by Yun Peng<pcloudy@google.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix ./configure on Windows (tensorflow#11775) * Fix ./configure on Windows * Disable bitwise_ops_test on Windows --- Commit ae3119d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Small changes to op framework. PiperOrigin-RevId: 163361071 --- Commit f40189d authored by qjivy<ji.qiu@spreadtrum.com> Committed by Vijay Vasudevan<vrv@google.com>: PR again: Enable building label_image with jpeg/gif/png decoder for Android. (tensorflow#11475) * Enable building label_image with jpeg/gif/png decoder for Android. Add dependency "android_tesnorflow_image_op" to label_image, which is not overlapped with android_tensorflow_kernels. * Running buildifier to reformat the BUILD files for sanity check. --- Commit 5991658 authored by KB Sriram<kbsriram@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Add the Constant operator class (tensorflow#11559) Create a custom operator class to create constants in the Graph, and introduce the Operator marker annotation to identify operator classes. Please see tensorflow#7149 for the master tracking issue. --- Commit 86ca350 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Further BUILD cleanup PiperOrigin-RevId: 163360750 --- Commit 376bb06 authored by Pete Warden<petewarden@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Look inside functions to see which node types are used. PiperOrigin-RevId: 163360375 --- Commit 2139e7d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.contrib.data] map expects a nested structure. Fixes tensorflow#11786 PiperOrigin-RevId: 163359134 --- Commit d09304f authored by Jonathan Hseu<vomjom@vomjom.net> Committed by Vijay Vasudevan<vrv@google.com>: Upgrade gRPC (tensorflow#11768) * BUILD rule modifications * More build fixes * Code changes * More code fixes * Working tests * CMake build * Fix pprof * Fix header includes * CMake fix test * Bazel clean * Fix verbs * More verbs fixes * bazel clean for XLA * Windows build fix test * Add openssl/rand.h * New cmake build command * --config Release --- Commit 3cd8284 authored by David Norman<DavidNorman@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix error with default python path selection (tensorflow#11814) * Fix error with default python path selection * Move setting of environment var outside if / else --- Commit ddd8e21 authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Consolidate all similar main()s in tests into a single target. PiperOrigin-RevId: 163354724 --- Commit a36bca2 authored by Tayo Oguntebi<tayo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove ShapeWithoutPadding() utility function, as it is no longer needed. PiperOrigin-RevId: 163353430 --- Commit b26f9cd authored by David Norman<DavidNorman@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Ensure that the multi-instruction fuse can take shared inputs (tensorflow#11748) * Ensure that the multi-instruction fuse can take shared inputs Note that the fuse action only works when the shared input / constant appears after all of its consumers in the list of instructions. * Add a comment describing the test --- Commit 34cbf16 authored by Jiri Simsa<jsimsa@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update Dataset API documentation. PiperOrigin-RevId: 163349457 --- Commit 2381ce5 authored by Abdullah Alrasheed<a.rasheed@tc-sa.com> Committed by Vijay Vasudevan<vrv@google.com>: DOC: Fix typo. (tensorflow#11813) you could could be I/O bottlenecked. TO: you could be I/O bottlenecked. --- Commit e4a5c53 authored by Toby Boyd<tobyboyd@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: ["Variable", "VariableV2", "VarHandleOp"] is the default for ps_ops=None PiperOrigin-RevId: 163344629 --- Commit 722f6f3 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix TensorForest's saveable object names so loading a savedmodel works. PiperOrigin-RevId: 163332598 --- Commit cda80a7 authored by Eric Liu<ioeric@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tpu profiler] Dump HLO graphs in profile responses to the log directory. PiperOrigin-RevId: 163318992 --- Commit cea9ef6 authored by horance<horance-liu@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Refactoring device name utils (tensorflow#11797) * remove duplicated code for full_name and legacy_name for DeviceNameUtils * replace tabs * Real->Device --- Commit 1f7c0f9 authored by Kongsea<kongsea@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Refine docstrings (tensorflow#11800) --- Commit dd1f0cd authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Supports lookup devices by fullname either in the canonical form or the legacy form. This makes DeviceSet behaves the same as DeviceMgr's FindDevice method. PiperOrigin-RevId: 163300346 --- Commit 631a364 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add Reduce, DynamicSlice and DynamicSliceUpdate to HloEvaluator. - Reduce is disabled explicitly for constant folding, as not all types of embedded computation can be currently supported by the evaluator. - Added support to evaluate HloModule to HloEvaluator. - Minor signature change to Evaluate(). PiperOrigin-RevId: 163299238 --- Commit a524701 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Sets the incarnation number even when the attribute is set. PiperOrigin-RevId: 163299121 --- Commit a49fe03 authored by Suharsh Sivakumar<suharshs@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove platform bridge for grpc_response_reader. PiperOrigin-RevId: 163295986 --- Commit 4404aa7 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add TODO comment explaining why the IsScalar check exists. PiperOrigin-RevId: 163292777 --- Commit 43036ac authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove unnecessary break statements. PiperOrigin-RevId: 163291947 --- Commit fd5de46 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add regression test for a corner case using Reduce that currently fails with the GPU backend. PiperOrigin-RevId: 163287986 --- Commit 32e198f authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Add tf.cross support. See tensorflow#11788 PiperOrigin-RevId: 163287731 --- Commit 88abddb authored by Alan Yee<alyee@ucsd.edu> Committed by Vijay Vasudevan<vrv@google.com>: Update README.md (tensorflow#11793) Remove bad practices of sudo pip and install use safer pip install commands --- Commit 9b30dc3 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove final mentions of `get_shape` in docstring. PiperOrigin-RevId: 163282839 --- Commit 423c1ee authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BREAKING CHANGE: Fix semantic error in how maybe_batch* handles sparse tensors. PiperOrigin-RevId: 163276613 --- Commit 6028c07 authored by Justin Lebar<jlebar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Highlight incoming/outgoing edges on hover in HLO graphviz dumps, and other improvements. Other improvements: - Don't show tooltips for nodes and clusters. Previously we'd show a tooltip containing a pointer value expressed as decimal. Not so useful. - Show tooltips on edges with the to/from node names. - Fix bug wherein if we had - a node at the "edge" of the graph (so its operands aren't included unless they're referenced by another node), - with all of its operands included in the graph save one or more constants, and - those constants weren't referenced by any nodes not at the edge of the graph, we would incorrectly draw the node as "grayed out", indicating that one of its operands (namely, its constant operand) wasn't present in the graph. This is wrong because constants are inlined into their users, so they should always count as "displayed" for the purposes of determining whether a node is grayed out. PiperOrigin-RevId: 163276108 --- Commit ce7a355 authored by Joshua V. Dillon<jvdillon@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update contrib/distributions/estimator_test build dependency. PiperOrigin-RevId: 163272464 --- Commit 1b8458a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Shorten docstring line. PiperOrigin-RevId: 163269709 --- Commit 69e323c authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix comment ypo PiperOrigin-RevId: 163266376 --- Commit 08790e7 authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Fix a bug in cloning outfeeds, carried the wrong shape. PiperOrigin-RevId: 163265592 --- Commit 1bad826 authored by Yangzihao Wang<yangzihao@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Rollback of GPU kernel implementation of transpose for tensors with one small dimension. END_PUBLIC BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 162525519 PiperOrigin-RevId: 163490703

GPhilo · 2017-08-21T15:50:42Z

I have a similar error message, but I'm not sure if I got the proposed solution correctly.
The code raising the error:

dataset = tf.contrib.data.TFRecordDataset(shard_files)
dataset = dataset.map(partial(decoder.decode, items=['label', 'image']))

where decoder is a tf.contrib.slim.tfexample_decoder.TFExampleDecoder()

So, if I got the issue right, since decoder.decode() returns a list [label, decoded_image_data] this is implicitly casted to a tensor (and thus the cast fails because label and image_data have different types).
However, if I write a lambda that wraps tuple() around the result of the call to decoder.decode(), this should fix the problem:

dataset = dataset.map( lambda s : tuple(decoder.decode(s, items=['label', 'image'])))

~~I still get the error, however:~~

~~TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'float32'> to <dtype: 'int64'> (Tensor is: <tf.Tensor 'distort_image/Mul:0' shape=(224, 224, 3) dtype=float32>)~~

Am I getting the solution wrong? Is this really the intended way to use the API? Even when many other TF APIs return lists instead of tuples?

EDIT: I had map() in several places and, of course, the new error was coming from the line below the one I fixed.

mrry · 2017-08-21T16:07:15Z

@GPhilo We've fixed this issue in the internal branch, and it should appear at HEAD soon. (Follow #12396 to see when the commit lands.) After the fix, the behavior of Dataset.map() will be the same if a function returns a tuple or a list containing the same elements, and the tuple(...) workaround will no longer be necessary.

cheind · 2017-08-22T08:09:35Z

@mrry sounds great!

tamizharasank · 2018-07-20T11:09:01Z

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

mrry · 2018-07-20T15:08:13Z

@tamizharasank This issue has been closed. If you think you've found a bug, please open a new issue with enough details about your program to reproduce the problem. Otherwise, Stack Overflow may be a more useful venue for your question.

aselle assigned mrry Jul 26, 2017

aselle added the type:docs-bug Document issues label Jul 26, 2017

asimshankar assigned shivaniag Jul 26, 2017

shivaniag closed this as completed Jul 27, 2017

vrv pushed a commit to vrv/tensorflow that referenced this issue Jul 28, 2017

[tf.contrib.data] map expects a nested structure.

2139e7d

Fixes tensorflow#11786 PiperOrigin-RevId: 163359134

bhack mentioned this issue Aug 26, 2017

[Enhancement] Redesigning TensorFlow's input pipelines #7951

Closed

TomorrowIsAnOtherDay mentioned this issue Sep 4, 2017

dataset with tf.pyfunc error #12721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contrib.data.Dataset - doc issue with Dataset.map / tf.py_func in 1.3.0rc0 #11786

contrib.data.Dataset - doc issue with Dataset.map / tf.py_func in 1.3.0rc0 #11786

cheind commented Jul 26, 2017

aselle commented Jul 26, 2017

aselle commented Jul 26, 2017

cheind commented Jul 27, 2017 •

edited

GPhilo commented Aug 21, 2017 •

edited

mrry commented Aug 21, 2017

cheind commented Aug 22, 2017

tamizharasank commented Jul 20, 2018

mrry commented Jul 20, 2018

contrib.data.Dataset - doc issue with Dataset.map / tf.py_func in 1.3.0rc0 #11786

contrib.data.Dataset - doc issue with Dataset.map / tf.py_func in 1.3.0rc0 #11786

Comments

cheind commented Jul 26, 2017

System information

aselle commented Jul 26, 2017

aselle commented Jul 26, 2017

cheind commented Jul 27, 2017 • edited

GPhilo commented Aug 21, 2017 • edited

mrry commented Aug 21, 2017

cheind commented Aug 22, 2017

tamizharasank commented Jul 20, 2018

mrry commented Jul 20, 2018

cheind commented Jul 27, 2017 •

edited

GPhilo commented Aug 21, 2017 •

edited