Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow 1.4 C++ API considerably slower than Python #15552

Closed
tensorfreitas opened this issue Dec 21, 2017 · 10 comments
Closed

Tensorflow 1.4 C++ API considerably slower than Python #15552

tensorfreitas opened this issue Dec 21, 2017 · 10 comments
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@tensorfreitas
Copy link


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow installed from (source or binary): source with all optimizations
  • TensorFlow version (use command below): 1.4
  • Python version: 3.5.2
  • Bazel version (if compiling from source): 0.8.1
  • GCC/Compiler version (if compiling from source): 5.4.0
  • CUDA/cuDNN version: 8.0 / 6.0
  • GPU model and memory: GTX960M

Describe the problem

I was trying to run several models and evaluate the performance with different batch sizes in python and c++ and noticed that the c++ API version is considerably slower than the python one. Both were compiled with the same optimizations and with cuda support.

When I try to predict the output of a single 256x256 image in python it takes me 0.5 seconds, and when i do it in tensorflow c++ api it takes me 1.7 seconds. Notice that in python I was using a non deployed model (without freezing and transforming graph), whereas in C++ I did those transformations.

Does anyone knows why this is happening? Is it because of the frozen and transformed graph?

I always thought the C++ API would be at least as fast as the Python version.

@tensorfreitas
Copy link
Author

tensorfreitas commented Dec 21, 2017

I will leave here some of the times for the predictions with different batch sizes:

Batch Size Python (s) C++ (s)
1 | 0.5 | 1.7
32 | 0.6 | 1.8
128 | 0.9 | 2.2

@michaelisard
Copy link

Can you try to repro using the same model from both Python and C++, to narrow down the sources of differences?

@michaelisard michaelisard added the stat:awaiting response Status - Awaiting response from author label Dec 21, 2017
@tensorfreitas
Copy link
Author

tensorfreitas commented Dec 22, 2017

The difference in time from optimizing or not the models are in the order of 100 ms in c++- The model is the same of the one used in python. So the problem still persists. I will try a different model and post the results today

@tensorfreitas
Copy link
Author

tensorfreitas commented Dec 22, 2017

I tried the Inception example and noticed that the C++ version had a better performance then the Python one. But with a simple model with a few Convolutional layers, batch normalization and dropout layers.
The frozen model is optimized via via graph transformed tool have an execution time considerably slower.

The code is very similar to the one used in https://www.tensorflow.org/tutorials/image_recognition.

Am I doing something wrong?

@tensorfreitas
Copy link
Author

After running some warm up runs in the session, I was able to increase massively the performance in c++.

Problem solved.

@csytracy
Copy link

csytracy commented Feb 6, 2018

Could you please let us know what kind of warm up runs you did to speed it up? Thank you so much.

@tensorfreitas
Copy link
Author

Hi @csytracy. I did some warm-up runs similar to the test_benchmark code:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/benchmark/benchmark_model.cc

@abhigoku10
Copy link

@Goldesel23 what is the technique you used to increase the speed of c++ can you pls share those tips
Thanks in advance

@tensorfreitas
Copy link
Author

If you follow the code in the samples you will get a good speed. You just need to understand that the first batches will be slower. In my application I just run some warm-up runs on the start after the session initialization.

@tangjie77wd
Copy link

@Goldesel23 Could you supply me more detailed warm-up you used ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

5 participants