-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidArgumentError when running map_fn on strings inside a tf.function #28007
Comments
@ipod825 I ran the code in Google colab without any error. Could you try with Google colab and see whether issue persists there. If it doesn't persist in Google colab, then try to upgrade your TF2.0 and run the code again. Thanks!
|
You need to run it on GPU. !pip install tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf
from tensorflow.keras import layers
H, W, C = 10, 10, 3
imgs = tf.zeros([10, H, W, C])
ds = tf.data.Dataset.from_tensor_slices(imgs)
ds = ds.batch(2)
conv = layers.Conv2D(32, (4, 4), strides=(2, 2), padding='same')
@tf.function
def run(img, i):
conv(img)
tf.summary.image('img', img, i)
if __name__ == "__main__":
train_summary_writer = tf.summary.create_file_writer('/tmp/testsummary')
with tf.device('/gpu:0'):
with train_summary_writer.as_default():
for i, img in enumerate(ds):
run(img, i)
print(i) |
sorry about my poor English. I have the same problem.But I found a solution.
everything is fine. |
Automatically closing this out since I understand it to be resolved, but please let me know if I'm mistaken.Thanks! |
Hi @jvishnuvardhan, the snippet I wrote still failed with the same error message in colab (using gpu accelerator). Even 2.0-beta version failed too. Could you please describe how you make it work? |
Well, it seems to be just a workaround to me. The main issue here is that the summary operation raises error when running on GPU. Forcing the operation to run on CPU doesn't really solve the problem but just ignores the problem. I don't know how summary operation works, probably even if running under GPU, it would still copy tensor back to CPU memory (which then would be similar to explicitly asking it to run on CPU). Even if this is the case (if not, we lose some efficiency), from an API point of view, I don't think this issue is solved as someone might encounter the same problem and don't know why it happen and how to solve it without bumping into this thread. |
@ipod825 I have the same problem (did try tf2.0 alphas and betas) and agree that assigning the summary op to /cpu:0 is only a workaround. Moreover, the fix is not working for me if I build from the r2.0 branch from source. I took a look at github: the map_fn in line 75 is causing the issues. |
@ipod825 @loffermann Sorry for closing. Reopened. Thanks! |
I ran to a similar issue, when I tried to save images to tf.summary.image using MirroredStrategy. Oddly enough, when debugging using tf.config.experimental_run_functions_eagerly(True) this error does not occur. (0) Invalid argument: 2 root error(s) found. Function call stack: |
I also ran into this issue. Here's a fairly minimal piece of code that reproduces it: import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()
def decode_png(data):
return tf.image.decode_png(data)
@tf.function # <= No exception if you comment this line out
def decode_all(images):
return tf.map_fn(decode_png, images, dtype=tf.uint8)
img = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\rIDATx\xdac\xfc\xcf\xf0\xbf\x1e\x00\x06\x83\x02\x7f\x94\xad\xd0\xeb\x00\x00\x00\x00IEND\xaeB`\x82'
images = tf.constant([img, img])
decode_all(images) and here's the full stackstrace:
I ran this on Colab with a GPU Runtime, using TF 1.15.0rc3. It will probably bomb as well on TF 2.0.0 but I haven't tried. |
A modification of the earlier gist posted by @jvishnuvardhan causes the error to occur again. Here it is. I am using both the GPU and CPU inside the decorated function because there might be a computationally expensive part that I have to run on the GPU. The CPU is used to run the summary ops as a workaround to this bug. However, the error still occurs. The only way to get around this is to run the entire function on the CPU when calling it, as @jvishnuvardhan 's gist, which is not ideal when I want to train a network. |
@Lannister-Xiaolin, I have not tried. I run program locally, without wrapping map_fn in keras.layer (as in your example). |
Hi, sorry for my poor English. I solved this problem by log the dataset images outside the @tf.function. |
@TIGERCHANG123 - interesting! can you please share a code example? |
I'm having the same issue using map_fn to decode a image list. Even when using my cpu.
|
Im having this issue too with TF2.2rc3 |
Hi thanks this trick works for my case also |
@saikumarchalla Yes, here's a Colab notebook. |
@rharish101 @saikumarchalla I think this has been fixed. I just ran the colab from @rharish101 with tf nightly, and am not seeing an error. Please confirm. |
@nikitamaia Yes, this is fixed on TensorFlow 2.3.0. I tested this on the colab and also on my local machine (running TF 2.3.0 on Arch Linux). |
Closing this issue now since the bug has been fixed. |
I am getting a similar error on TF 2.4.1. import tensorflow as tf
def _get_char_flags(char_tensor):
char = char_tensor.numpy().decode()
return char.isalpha(), char.isspace()
@tf.function # <--------------- Runs fine with this line commented out.
def tf_split_text(text):
tf.assert_rank(text, 0)
tf.debugging.assert_type(text, tf.string)
chars = tf.strings.unicode_split(text, input_encoding='UTF-8')
is_alpha, is_space = tf.map_fn(lambda char: tf.py_function(_get_char_flags, char, Tout=[tf.bool, tf.bool]),
(chars,), dtype=tf.bool, parallel_iterations=True,
fn_output_signature=[tf.bool, tf.bool])
is_alpha = tf.concat([is_alpha, [False]], axis=0)
is_space = tf.concat([is_space, [True]], axis=0)
is_special = ~(is_alpha | is_space)
is_non_alpha = ~is_alpha
is_non_space = ~is_space
was_special = tf.concat([[False], is_special[:-1]], axis=0)
was_non_alpha = tf.concat([[True], is_non_alpha[:-1]], axis=0)
was_non_space = tf.concat([[False], is_non_space[:-1]], axis=0)
any_to_special = is_special
non_alpha_to_non_space = was_non_alpha & is_non_space
token_start_flags = any_to_special | non_alpha_to_non_space
token_start_indices = tf.where(token_start_flags)[:, 0]
special_to_any = was_special
non_space_to_non_alpha = was_non_space & is_non_alpha
token_end_flags = special_to_any | non_space_to_non_alpha
token_end_indices = tf.where(token_end_flags)[:, 0]
tf.debugging.assert_equal(tf.size(token_start_indices), tf.size(token_end_indices))
preceding_space = tf.concat([[False], is_space[:-1]], axis=0)
tokens = tf.strings.substr(text, token_start_indices, token_end_indices - token_start_indices, unit='UTF8_CHAR')
has_preceding_space = tf.gather(preceding_space, token_start_indices)
tf.assert_rank(has_preceding_space, 1)
tf.assert_rank(tokens, 1)
tf.assert_equal(tf.reduce_sum(tf.map_fn(tf.strings.length, tokens, fn_output_signature=tf.int32)) +
tf.reduce_sum(tf.cast(has_preceding_space, tf.int32)),
tf.strings.length(text))
return has_preceding_space, tokens
text = tf.constant('hi there', tf.string)
preceding_spaces, words = tf_split_text(text)
print(words) Traceback (most recent call last):
File "/home/hosford42/PycharmProjects/ImageParser/error.py", line 56, in <module>
preceding_spaces, words = tf_split_text(text)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 894, in _call
return self._concrete_stateful_fn._call_flat(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: 2 root error(s) found.
(0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
(1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
[[{{node map_1/TensorArrayUnstack/TensorListFromTensor/_96}}]]
[[map_1/while/loop_body_control/_61/_107]]
(1) Invalid argument: 2 root error(s) found.
(0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
(1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
[[{{node map_1/TensorArrayUnstack/TensorListFromTensor/_96}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_tf_split_text_265]
Function call stack:
tf_split_text -> tf_split_text |
Hi @hosford42, I just ran your code sample in Colab and I'm not seeing any errors. Please let me know if I'm missing something and open a new issue for further debugging. |
@nikitamaia, I'm not using Colab, so perhaps the environment is the issue. I'm running Python 3.8.5 on an Ubuntu machine. I can provide more details if necessary. Here is the full output when I start python and import tensorflow to show the version number: Python 3.8.5 (default, Jul 28 2020, 12:59:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.13.0
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
import tensorflow as tf
2021-02-22 12:46:55.445604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Invalid MIT-MAGIC-COOKIE-1 keyIn[3]: tf.__version__
Out[3]: '2.4.1' |
And here is the logging output when I start a session, for GPU-specific info if that's needed:
|
Actually, I just ran the Colab gist that I shared in my earlier post, but this time with a GPU runtime. I'm now seeing the same error message that you reported. So seems to be a GPU related issue. Can you open a new bug with all of this information? Thanks! |
System information
3.7.1
cudatoolkit-10.0.130-0
cudnn-7.3.1-cuda10.0_0
GeForce RTX 2080 Ti
Describe the current behavior
Running the provided code on GPUs leads to error message
tensorflow.python.framework.errors_impl.InvalidArgumentError: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
Without feeding the tensor to the convolution layer,
summary.image
would succeed.Describe the expected behavior
Should run smoothly.
Code to reproduce the issue
Other info / logs
The text was updated successfully, but these errors were encountered: