-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow 1.4 C++ API considerably slower than Python #15552
Comments
I will leave here some of the times for the predictions with different batch sizes: Batch Size Python (s) C++ (s) |
Can you try to repro using the same model from both Python and C++, to narrow down the sources of differences? |
The difference in time from optimizing or not the models are in the order of 100 ms in c++- The model is the same of the one used in python. So the problem still persists. I will try a different model and post the results today |
I tried the Inception example and noticed that the C++ version had a better performance then the Python one. But with a simple model with a few Convolutional layers, batch normalization and dropout layers. The code is very similar to the one used in https://www.tensorflow.org/tutorials/image_recognition. Am I doing something wrong? |
After running some warm up runs in the session, I was able to increase massively the performance in c++. Problem solved. |
Could you please let us know what kind of warm up runs you did to speed it up? Thank you so much. |
Hi @csytracy. I did some warm-up runs similar to the test_benchmark code: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/benchmark/benchmark_model.cc |
@Goldesel23 what is the technique you used to increase the speed of c++ can you pls share those tips |
If you follow the code in the samples you will get a good speed. You just need to understand that the first batches will be slower. In my application I just run some warm-up runs on the start after the session initialization. |
@Goldesel23 Could you supply me more detailed warm-up you used ? |
System information
Describe the problem
I was trying to run several models and evaluate the performance with different batch sizes in python and c++ and noticed that the c++ API version is considerably slower than the python one. Both were compiled with the same optimizations and with cuda support.
When I try to predict the output of a single 256x256 image in python it takes me 0.5 seconds, and when i do it in tensorflow c++ api it takes me 1.7 seconds. Notice that in python I was using a non deployed model (without freezing and transforming graph), whereas in C++ I did those transformations.
Does anyone knows why this is happening? Is it because of the frozen and transformed graph?
I always thought the C++ API would be at least as fast as the Python version.
The text was updated successfully, but these errors were encountered: