Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow performance on extracting grpc request results using .float_val #1725

Closed
denisb411 opened this issue Aug 27, 2020 · 8 comments
Closed
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response type:performance Performance Issue

Comments

@denisb411
Copy link

denisb411 commented Aug 27, 2020

For some reason the time used to extract results using .float_val is extremely high.

Scenario example along with its output:

t2 = time.time()
options = [('grpc.max_receive_message_length', 100 * 4000 * 4000)]
channel = grpc.insecure_channel('{host}:{port}'.format(host='localhost', port=str(self.serving_grpc_port)), options = options)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'ivi-detector'
request.model_spec.signature_name = 'serving_default'

request.inputs['inputs'].CopyFrom(tf.make_tensor_proto(imgs_array, shape=imgs_array.shape))
res = stub.Predict(request, 100.0)

print("Time to detect:")
t3 = time.time(); print("t3:", t3 - t2)

t11 = time.time()
boxes_float_val = res.outputs['detection_boxes'].float_val
t12 = time.time(); print("t12:", t12 - t11)
classes_float_val = res.outputs['detection_classes'].float_val
t13 = time.time(); print("t13:", t13 - t12)
scores_float_val = res.outputs['detection_scores'].float_val
t14 = time.time(); print("t14:", t14 - t13)

boxes = np.reshape(boxes_float_val, [len(imgs_array), self.max_total_detections,4])
classes = np.reshape(classes_float_val, [len(imgs_array), self.max_total_detections])
scores = np.reshape(scores_float_val, [len(imgs_array), self.max_total_detections])
t15 = time.time(); print("t15:", t15 - t14)
Time to detect:
t3: 1.4687104225158691
t12: 1.9140026569366455
t13: 3.719329833984375e-05
t14: 9.298324584960938e-06
t15: 0.0008063316345214844

Tensorflow Serving is running an object detection model from tensorflow's object detection api (faster_rncc_resnet101). As we can see, the extraction of the boxes found on detection is higher than the prediction itself.

The current shape of the detected boxes is [batch_size, 100, 4], with 100 being the number of max detections.
As a workaround I can low the number of max detection and decrease significantly the necessary time to extract these values, but it keeps staying unnecessary (on my point of view) high.

I'm using tensorflow-serving 2.3.0-gpu as a docker container along with tensorflow-serving-api==2.3.0

@rmothukuru rmothukuru self-assigned this Aug 28, 2020
@rmothukuru rmothukuru added the type:performance Performance Issue label Aug 28, 2020
@rmothukuru
Copy link

@denisb411,
In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thanks!

@rmothukuru
Copy link

@denisb411,
Can you please respond to the above comment. Thanks!

@denisb411
Copy link
Author

denisb411 commented Sep 11, 2020

@rmothukuru Sorry for delay on response.
I started to write some code to reproduce this and then reached a very good result using the pretrained weights from the tf-od api repo. Knowing this, I suspected that the way I was exporting my model's frozen graph (it's a transfer learning training model from the weights I mentioned) and tried to manually export the frozen graph from the original weights (those .ckpt files) to see if the result was the same, and it was. It was fast as the .pb that came together with the download.
Now I'm suspecting that something is wrong/different with the model I trained.... but.. why? Does it makes sense that it affects the float cast?

The code I used (with fast results):
https://github.com/denisb411/tfserving-od/blob/master/inference-using-tfserving-docker.ipynb

I don't know how to proceed as my custom training follows almost the same pipeline.config as the original so, there's nothing different on the training process.
How can I manage to fix this? How is this related with .float_val attribute, if there's any relation?

@denisb411
Copy link
Author

@rmothukuru @yimingz-a any updates on this?

@singhniraj08
Copy link

@denisb411,

The performance issue of .float_val looks like a issue with saving and loading weights to the model and not with TF Serving. I would suggest to follow this article which walks through deploying object detection model on TF Serving.

Since the issue doesn't persist with pre-trained weights, I would suggest you to compare the PredictResponse object from your trained model and with the model with pre-trained weights. If the issue persists, please create a new question on StackOverflow with the tags "tensorflow" and "object-detection".

Thank you!

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 6, 2023
@github-actions
Copy link

This issue was closed due to lack of activity after being marked stale for past 7 days.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

5 participants