Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

Closed
dipsatch opened this issue Dec 8, 2018 · 11 comments

Comments

@dipsatch
Copy link

dipsatch commented Dec 8, 2018

I am running an object detection model using tensorflow/serving:latest-gpu docker image & Nvidia-docker on Amazon Deep Learning AMI (EC2 P3 instance). The model server starts up fine. Then I run a gRPC client that loops through several images & sending them over to the server to fetch predictions. I am getting expected & quick predictions, and the server runs on ~95% GPU utilization (memory used is below limits).

However, often the model server crashes after giving continuous predictions for a while. The error it gives right before crashing is:

F external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) 
/usr/bin/tf_serving_entrypoint.sh: line 3: 8 Aborted tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

I have tried sending larger payloads from the client to the server & have observed resource exhaustion errors, which makes sense since the GPU goes out of memory. But I am not able to understand what exactly is causing the above issue.

Can someone please help?

Thanks in advance.

@ymodak ymodak self-assigned this Dec 10, 2018
@gautamvasudevan
Copy link
Collaborator

That appears to come from the memory allocator when trying to free memory it thinks has been freed (or is still in use for some reason). This could be a bug in code (say, a memory leak) or simply a side effect of running out of memory for other reasons.

There's not enough information here to debug anything further, though. This is deep in Tensorflow core logic, so if you can reproduce the issue, you might want to file an issue on the Tensorflow project.

@echan00
Copy link

echan00 commented Dec 15, 2018

@dipsatch your problem is definitely related to memory (whether your own code or tensorflow itself), I had the same issue and see that before the tf server crashed memory usage was at capacity

@Harshini-Gadige
Copy link

Closing this issue as there is no response received from the user. Feel free to post updates(if any), we will reopen the issue.

@YananJian
Copy link

I'm having the same issue. Server is getting ~95% utilization and crashes after a few iterations of training. I'm using tf version 1.12.0.
The error I'm getting is F tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) Aborted

@schen119
Copy link

schen119 commented Feb 5, 2019

This exception might be related to this issue #22581 https://github.com/tensorflow/tensorflow/issues/22581

@wronk
Copy link

wronk commented Feb 19, 2019

I had the same issue and was able to solve this pulling the most recent tf-nightly-gpu image (with v1.13.0). See the comments here.

@DenceChen
Copy link

i got the same issue, and my tf-version=1.12.0, have someone kown about this

@ygean
Copy link

ygean commented Mar 22, 2019

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

@DenceChen
Copy link

@zhouyuangan tf==1.9.0 will be ok!

@YananJian
Copy link

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

Yes, tf==1.10.0 works.

@alexcpn
Copy link

alexcpn commented May 24, 2019

Got the same issue while using tensorflow/serving:latest-gpu . Used the latest one and tested with three streams and found this problem is solved in tensorflow/serving:1.13-gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests