Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ api runs much slower than Python API (compile flags) #3471

Closed
lingz opened this issue Jul 22, 2016 · 6 comments
Closed

C++ api runs much slower than Python API (compile flags) #3471

lingz opened this issue Jul 22, 2016 · 6 comments
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@lingz
Copy link
Contributor

lingz commented Jul 22, 2016

My graph run in Python only takes 6 seconds for one batch, but when I run the identical batch on the same graph (graph_freeze) in the C++ Api, the time is 80 seconds. I'm guessing this 13x slowdown is probably from using the wrong C flags during compilation. This is all running on CPU only.

I'm loading the graphs using the same way as in the label_images example.

I took a look at: #2721, and added the -mavx C flag, which increased it by about double, but still 13x slower than the python.

The graph is a mostly a large multi layered regular RNN but with some feedforward as well.

Any ideas on how to get it to the same speed as python? Is there somewhere I can see what flags tensorflow installed from source is compiled with?

Environment info

Operating System: Linux ubuntu 64 bit 14.04

Installed version of CUDA and cuDNN: None (CPU Only)
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

If installed from binary pip package, provide:

  1. Which pip package you installed.
    Linux 64 Bit CPU Python 3.5
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)".
    0.9.0

If installed from source, provide

  1. The commit hash (git rev-parse HEAD)
  2. The output of bazel version

Steps to reproduce

  1. Create graph in python
  2. freeze_graph.py
  3. Load graph in C++

What have you tried?

  1. adding -mavx C flag

Logs or other output that would be helpful

(If logs are large, please upload as attachment).

@concretevitamin
Copy link
Contributor

Did you use the -c opt flag? I.e.

bazel -c opt --copt=-mavx build <...>

@concretevitamin concretevitamin added the stat:awaiting response Status - Awaiting response from author label Jul 22, 2016
@lingz
Copy link
Contributor Author

lingz commented Jul 22, 2016

Ah amazing, I got the run times from 2minutes -> 4.5 seconds. However, note that you must pass the flags after the build keyword:

bazel build -c opt --copt=-mavx <...>

Maybe this should be added to documentation somewhere?

@lingz lingz closed this as completed Jul 22, 2016
@concretevitamin
Copy link
Contributor

Right. "-c opt" means optimized build.

On Friday, July 22, 2016, Lingliang Zhang notifications@github.com wrote:

Closed #3471 #3471.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3471 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAkLHqaVmdOMiWjVmjEpiJU7PqDgFTKqks5qYVRLgaJpZM4JTLKK
.

@lingz lingz changed the title C++ api runs much slower than Python API (what compile flags do I need?) C++ api runs much slower than Python API (compile flags) Nov 25, 2016
@venuktan
Copy link

venuktan commented Aug 2, 2017

@lingz I am trying to run the exported .pb file in C++ and getting errors.
The .pb file works in python but not c++.
I am feeding it as cv::Mat

cv::Mat frameC;
static TF_Operation *placeholder = TF_GraphOperationByName(graph, "batch:0");
static TF_Operation *output_op = TF_GraphOperationByName(graph, "probability/class_idx");

for (;;)
    {
        if (!capture.read(frameC))
        {
            std::cerr << "Failed to grab frame" << std::endl;
            continue;
        }
        cv::resize(frameC, dest, cv::Size(inputWidth, inputHeight));
        
        TF_Tensor *tensor = TF_NewTensor(TF_FLOAT, dims, 4, dest, size, &deallocator, nullptr);
        csession.SetInputs({{placeholder, tensor}});
        std::chrono::steady_clock::time_point beginRun = std::chrono::steady_clock::now();
        csession.Run(s);

    }

Can you point me to a how you ran it in C++ ?
Thank you

@jimaldon
Copy link

I have an independant project using Makefile and tensorflow shared object file instead of bazel to build. What's the g++ equalivalent of -c opt here?

@CasiaFan
Copy link

CasiaFan commented May 2, 2018

In my case, adding optimization options (all available for cpu) during bazel build works with me.
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow:libtensorflow_cc.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

5 participants