Skip to content

Commit

Permalink
Merge pull request #37853 from ashutosh1919:tf_dataset_map
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 304478576
Change-Id: I73a15d650b99d593e83d79e5e5494e2eb181bcb8
  • Loading branch information
tensorflower-gardener committed Apr 2, 2020
2 parents 04f2ff0 + 16cc300 commit 8d366c3
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions tensorflow/python/data/ops/dataset_ops.py
Expand Up @@ -1589,6 +1589,23 @@ def map(self, map_func, num_parallel_calls=None, deterministic=None):
>>> list(d.as_numpy_iterator())
[b'HELLO', b'WORLD']
3) Use `tf.numpy_function`, which also allows you to write arbitrary
Python code. Note that `tf.py_function` accepts `tf.Tensor` whereas
`tf.numpy_function` accepts numpy arrays and returns only numpy arrays.
For example:
>>> d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])
>>> def upper_case_fn(t: np.ndarray):
... return t.decode('utf-8').upper()
>>> d = d.map(lambda x: tf.numpy_function(func=upper_case_fn,
... inp=[x], Tout=tf.string))
>>> list(d.as_numpy_iterator())
[b'HELLO', b'WORLD']
Note that the use of `tf.numpy_function` and `tf.py_function`
in general precludes the possibility of executing user-defined
transformations in parallel (because of Python GIL).
Performance can often be improved by setting `num_parallel_calls` so that
`map` will use multiple threads to process elements. If deterministic order
isn't required, it can also improve performance to set
Expand Down

0 comments on commit 8d366c3

Please sign in to comment.