Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.keras.backend.eye crash (abort) with large input #57711

Closed
DNXie opened this issue Sep 15, 2022 · 8 comments
Closed

tf.keras.backend.eye crash (abort) with large input #57711

DNXie opened this issue Sep 15, 2022 · 8 comments
Assignees
Labels
comp:ops OPs related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.10 type:bug Bug

Comments

@DNXie
Copy link

DNXie commented Sep 15, 2022

Click to expand!

Issue Type

Bug

Source

binary

Tensorflow Version

2.11.0-dev20220914

Custom Code

No

OS Platform and Distribution

Ubuntu 18.04.4 LTS (x86_64)

Mobile device

No response

Python version

3.7.6

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

N/A

GPU model and memory

No response

Current Behaviour?

tf.keras.backend.eye and tf.eye crash (abort) with large input

Also reproduced in this gist

Standalone code to reproduce the issue

import tensorflow as tf
tf.keras.backend.eye(size=2752212975)
import tensorflow as tf
tf.eye(2752212975)

Relevant log output

2022-09-15 18:51:32.477313: F tensorflow/core/framework/tensor_shape.cc:572] Check failed: size >= 0 (0 vs. -1542754321)
Aborted (core dumped)
@tilakrayal
Copy link
Contributor

@sachinprasadhs,
I was able to reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

@sachinprasadhs
Copy link
Contributor

This is due to the very large input which is causing OOM/ memory overflow with large input, when you try large value like tf.int32.max, you will get the error output.
Below is the error output.

import tensorflow as tf
tf.eye(2147483647)

ResourceExhaustedError: OOM when allocating tensor with shape[2147483647,2147483647] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [Op:MatrixDiagV3] name: diag 

@DNXie
Copy link
Author

DNXie commented Sep 19, 2022

@sachinprasadhs Hi, Thanks for looking into this. With the input I provided, I see a crash (abortion) instead of an OOM error.

As public APIs, it would be great to have the functions kindly throw exceptions for these cases instead of crashing.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Sep 19, 2022
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 20, 2022
@yongtang
Copy link
Member

Added a PR #57790 for the fix.

@tilakrayal
Copy link
Contributor

@DNXie,
When I tried to execute the code on tf-nightly, I observed that the crash happened in colab where in the run time logs it mentioned the OOM error as the warning for the large input. Kindly find the screenshots for the reference

image

Screenshot 2024-05-16 6 23 15 PM

@tilakrayal tilakrayal added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels May 16, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label May 24, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.10 type:bug Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants