Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Hello! I am using batching and am running
And I run everything by doing
The GPU is a Tesla P100 and the system has 8 cores running tf
When I send 1000 concurrent requests to the server, it takes around 35 seconds without batching. With batching, it takes nearly the same exact time - about 34.5 seconds.
I know that the
I saw some other posts mention that building
Any advice would be great!
I also met the problem and did a lot of test, and now I can get some benefits from batching, from <200 images per gpu to nearly 500 images per gpu.
I guess you may probably post 1 image per request (or maybe other kind of data, whatever ). If so, setting
You can set
For fully use gpu devices, you can form batch at client side, which means request 16 or 32 images per request. It will much more efficient than forming batch at server side.
And here is a post relative, you may also find some help here.
For CPU, one can set batch_timeout_micros to 0. Then experiment with batch_timeout_micros values in the 1-10 millisecond (1000-10000 microsecond) range.
Since your scenario is with GPU, please find below approach.
Did you find any ideas to resolve this problem?
I have the same problem.
I have tested my model to predict the embedding of image. It takes only 0.12s for 50 images with batch size 50. But when I convert the Keras model to Tensorflow SavedModel and serve by Tensorflow serving. It takes 3s to calculate embeddings.
My batch.config file