-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206
Comments
That appears to come from the memory allocator when trying to free memory it thinks has been freed (or is still in use for some reason). This could be a bug in code (say, a memory leak) or simply a side effect of running out of memory for other reasons. There's not enough information here to debug anything further, though. This is deep in Tensorflow core logic, so if you can reproduce the issue, you might want to file an issue on the Tensorflow project. |
@dipsatch your problem is definitely related to memory (whether your own code or tensorflow itself), I had the same issue and see that before the tf server crashed memory usage was at capacity |
Closing this issue as there is no response received from the user. Feel free to post updates(if any), we will reopen the issue. |
I'm having the same issue. Server is getting ~95% utilization and crashes after a few iterations of training. I'm using tf version 1.12.0. |
This exception might be related to this issue #22581 https://github.com/tensorflow/tensorflow/issues/22581 |
I had the same issue and was able to solve this pulling the most recent |
i got the same issue, and my tf-version=1.12.0, have someone kown about this |
@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls. |
@zhouyuangan tf==1.9.0 will be ok! |
Yes, tf==1.10.0 works. |
Got the same issue while using tensorflow/serving:latest-gpu . Used the latest one and tested with three streams and found this problem is solved in tensorflow/serving:1.13-gpu |
I am running an object detection model using
tensorflow/serving:latest-gpu
docker image & Nvidia-docker on Amazon Deep Learning AMI (EC2 P3 instance). The model server starts up fine. Then I run a gRPC client that loops through several images & sending them over to the server to fetch predictions. I am getting expected & quick predictions, and the server runs on ~95% GPU utilization (memory used is below limits).However, often the model server crashes after giving continuous predictions for a while. The error it gives right before crashing is:
I have tried sending larger payloads from the client to the server & have observed resource exhaustion errors, which makes sense since the GPU goes out of memory. But I am not able to understand what exactly is causing the above issue.
Can someone please help?
Thanks in advance.
The text was updated successfully, but these errors were encountered: