Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

dipsatch · 2018-12-08T01:23:30Z

I am running an object detection model using tensorflow/serving:latest-gpu docker image & Nvidia-docker on Amazon Deep Learning AMI (EC2 P3 instance). The model server starts up fine. Then I run a gRPC client that loops through several images & sending them over to the server to fetch predictions. I am getting expected & quick predictions, and the server runs on ~95% GPU utilization (memory used is below limits).

However, often the model server crashes after giving continuous predictions for a while. The error it gives right before crashing is:

F external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) 
/usr/bin/tf_serving_entrypoint.sh: line 3: 8 Aborted tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

I have tried sending larger payloads from the client to the server & have observed resource exhaustion errors, which makes sense since the GPU goes out of memory. But I am not able to understand what exactly is causing the above issue.

Can someone please help?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

gautamvasudevan · 2018-12-10T23:18:15Z

That appears to come from the memory allocator when trying to free memory it thinks has been freed (or is still in use for some reason). This could be a bug in code (say, a memory leak) or simply a side effect of running out of memory for other reasons.

There's not enough information here to debug anything further, though. This is deep in Tensorflow core logic, so if you can reproduce the issue, you might want to file an issue on the Tensorflow project.

echan00 · 2018-12-15T08:57:35Z

@dipsatch your problem is definitely related to memory (whether your own code or tensorflow itself), I had the same issue and see that before the tf server crashed memory usage was at capacity

Harshini-Gadige · 2019-01-15T00:31:20Z

Closing this issue as there is no response received from the user. Feel free to post updates(if any), we will reopen the issue.

YananJian · 2019-01-16T09:03:04Z

I'm having the same issue. Server is getting ~95% utilization and crashes after a few iterations of training. I'm using tf version 1.12.0.
The error I'm getting is F tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) Aborted

schen119 · 2019-02-05T21:55:31Z

This exception might be related to this issue #22581 https://github.com/tensorflow/tensorflow/issues/22581

wronk · 2019-02-19T02:22:50Z

I had the same issue and was able to solve this pulling the most recent tf-nightly-gpu image (with v1.13.0). See the comments here.

DenceChen · 2019-03-19T03:24:00Z

i got the same issue, and my tf-version=1.12.0, have someone kown about this

ygean · 2019-03-22T01:39:27Z

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

DenceChen · 2019-03-22T01:48:27Z

@zhouyuangan tf==1.9.0 will be ok!

YananJian · 2019-03-26T02:57:12Z

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

Yes, tf==1.10.0 works.

alexcpn · 2019-05-24T03:53:51Z

Got the same issue while using tensorflow/serving:latest-gpu . Used the latest one and tested with three streams and found this problem is solved in tensorflow/serving:1.13-gpu

ymodak self-assigned this Dec 10, 2018

ymodak added the stat:awaiting response label Dec 12, 2018

Harshini-Gadige closed this as completed Jan 15, 2019

nieksand mentioned this issue Feb 19, 2019

kInvalidBinNum error triton-inference-server/server#99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

dipsatch commented Dec 8, 2018 •

edited

gautamvasudevan commented Dec 10, 2018

echan00 commented Dec 15, 2018

Harshini-Gadige commented Jan 15, 2019

YananJian commented Jan 16, 2019

schen119 commented Feb 5, 2019

wronk commented Feb 19, 2019

DenceChen commented Mar 19, 2019

ygean commented Mar 22, 2019

DenceChen commented Mar 22, 2019

YananJian commented Mar 26, 2019

alexcpn commented May 24, 2019

Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted #1206

Comments

dipsatch commented Dec 8, 2018 • edited

gautamvasudevan commented Dec 10, 2018

echan00 commented Dec 15, 2018

Harshini-Gadige commented Jan 15, 2019

YananJian commented Jan 16, 2019

schen119 commented Feb 5, 2019

wronk commented Feb 19, 2019

DenceChen commented Mar 19, 2019

ygean commented Mar 22, 2019

DenceChen commented Mar 22, 2019

YananJian commented Mar 26, 2019

alexcpn commented May 24, 2019

dipsatch commented Dec 8, 2018 •

edited