New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF_SessionRun() from C API crashes when not enough RAM #53413
Labels
comp:runtime
c++ runtime, performance issues (cpu)
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
TF 2.7
Issues related to TF 2.7.0
type:bug
Bug
Comments
tilakrayal
added
TF 2.7
Issues related to TF 2.7.0
comp:runtime
c++ runtime, performance issues (cpu)
labels
Dec 14, 2021
sachinprasadhs
added
the
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
label
Dec 15, 2021
is anyone working on this issue? |
The same crash exception in my project. I used C_API of Tensorflow 2.4.0.
The next callstack is from the calling method in the same dump file:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
comp:runtime
c++ runtime, performance issues (cpu)
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
TF 2.7
Issues related to TF 2.7.0
type:bug
Bug
System information
Describe the current behavior
I'm using tensorflow C API in my C++ code. I'm using microsoft visual studio 2019 (compiler version - msvc 14.26).
When there's not enough memory, TF_SessionRun() crashes. Visual Studio debugger shows that TF_SessionRun() throws std::bad_alloc. If I debug code and place break immediately before executing TF_SessionRun() and immediately after it, after executing TF_SessionRun() I get the following exception message in Debugger
"Exception thrown at 0x00007FFB9F7EA859 in MyProject.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x000000F789CFD580"
And here's the callstack from where the exception is thrown.
[External Code]
And the worst thing is that I cannot wrap this call to TF_SessionRun() in try-catch, because the compiler optimizes it away, since we are calling C code, and C code isn't allowed to throw exceptions. But even if could be caught, it is technically undefined behavior, and I cannot rely on catching exceptions from tensorflow.dll, because 1) it was built with a different compiler 2) it is C API, it should not throw anything.
I attached files.zip, which contains main.cpp file that reproduces the problem. It reads neural network protobuf (.pb) file and tries to run it. Script, that generated neural network, and the neural network itself are also inside files.zip
On line 11 of main.cpp there is variable SIZE (const size_t SIZE = 500;) On my machine TF_SessionRun() crashes when SIZE is around 500. It can vary from machine to machine, try different values to reproduce the problem on your machine.
So, the problem is that if SIZE is sufficiently big, a call to TF_SessionRun() on line 63 crashes, and we do not reach the code below TF_SessionRun() that would print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR".
(But if SIZE is sufficiently small, it successfully finishes and prints "FINISHED SUCCESSFULLY")
Describe the expected behavior
If there's not enough memory, TF_SessionRun() should return error code via TF_Status*, and it should not crash.
If we reached call to TF_SessionRun() in main.cpp, then after the call it should print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR"
Standalone code to reproduce the issue
I attached an archive - files.zip. It contains 3 files:
main.cpp - C++ code that reproduces the problem
frozen_graph.pb - neural network protobuf file, that is used inside main.cpp
generate_graph.py - script, that generates frozen_graph.pb
I think you can reproduce this bug (where TF_SessionRun() crashes because there isn't enough RAM) with any sufficiently complex neural network, that consumes lots of RAM. I actually encountered this bug with a different neural network, I cannot share it with you, because it is not my Intellectual Property.
files.zip
The text was updated successfully, but these errors were encountered: