New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error during training: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 #45658
Comments
@sergorl, Could you please update TensorFlow, CUDA, cuDNN and the NVIDIA drivers to the latest version as per the installation guide and check if you are facing the same issue. Thanks! |
@amahendrakar, I did like you said: I updated cuda, cudnn, gpu driver. But now I see all gets stuck: there is no cuda activity, but memory consuming happens, training freezes on first iteration. |
@sergorl, Also, please run the below code snippet and share the output with us.
Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
Hi @sergorl, Thanks, |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
During training neural network on 17th epoch I faced error:
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
I tried rerun many times and every time failed epoch number of training was different.
Describe the expected behavior
I think training should be stable.
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
I deployed this repo: https://github.com/arthurflor23/handwritten-text-recognition
Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
This code:
gives me:
The text was updated successfully, but these errors were encountered: