New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample for report_tensor_allocations_upon_oom and RunOptions #17076
Comments
@Yagun Do you care about CPU, GPU, or both? |
I got it work like this:
This will produce messages like this :
But it seems like - it does not contain all allocation, rendering it a bit pointless :/ @cy89 is there a way to get even more details? @Yagun for more Information regarding runoptions you may look here: I think TF is in a real need of an in deep Tutorial for understanding its core and how to debug in case of errors. Handling OOM on the GPU is quiet a pain without understanding the allocations |
@cy89 @zheng-xq |
I agree that this is a good feature request, i.e. there should be a guide to memory use debugging in TF, especially GPU memory use since there are some non-obvious tricks going on. Eventually someone from TF will probably get around to doing it, but probably not soon, so I'm going to mark it contributions welcome. @geogh I see that you made some progress in the other thread, using @yaroslavvb 's tool. It would be great if either of you want to contribute some notes on this topic. |
Kindly requesting what is the status of this ticket? |
I would like to know most recent status of this ticket. |
Also pinging in here. TF reports
But there doesn't seem to be a way to do that in TF 2.x |
@Yagun, Hope this helps. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
My previous comment seems to have been missed: There still doesn't seem to be a way to get the OOM info in TF 2 as indicated by the error message shown by TF 2. See also #37556 and tensorflow/tensorflow/core/common_runtime/executor.cc Lines 989 to 991 in 78d7f8b
|
@Flamefire, |
Not yet as it involves some setup and usage of a notebook which is a bit involved when running on HPC nodes. Adding some option to have some output on the stdout/stderr in case of failure would have been much more usable in that context. And if that is not possible, then TF should not tell you so in its error message. I spend quite some time looking how to follow the advice of "add report_tensor_allocations_upon_oom to RunOptions". |
Closing as stale. Please reopen if you'd like to work on this further. |
What's wrong with the bot? There has been activity here since its last comment, so why close it? |
@Flamefire, |
amazing. this bug is 4 years old and still open |
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you. |
. |
5 yrs... |
This is a feature request.
Please add some example to the docs describing how to use report_tensor_allocations_upon_oom and other options of RunOptions
All I could find is this file:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/profiler/model_analyzer_test.py
But it is not obvious. For example, it contains:
and then
And more questions arise like: "What is config_pb2?" etc.
Thanks.
The text was updated successfully, but these errors were encountered: