Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/makefile: No session factory registered for the given session options #3308

Closed
Ape opened this Issue Jul 14, 2016 · 21 comments

Comments

Projects
None yet
@Ape
Copy link

Ape commented Jul 14, 2016

Environment info

Operating System: Arch Linux 64-bit
GCC: 6.1.1
no CUDA or cuDNN used

Tensorflow installed from Git repo sources. I tried v0.9.0 and current master (c129591).

Steps to reproduce

1. Build contrib/makefile (download_dependencies.sh + make)
2. Link produced tensorflow-core to a C++ project
3. Try to create a new TensorFlow session:

Session* session;
Status status = NewSession(SessionOptions(), &session);

4. Compile succeeds, but running the code gives:

E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "" config: } Registered factories are {}.

tensorflow/contrib/makefile/gen/bin/benchmark seems to be working.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 14, 2016

This may be related to #3309 since also that occurs sometimes for me.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 14, 2016

The same issue also occurs with the C API and the following code:

TF_Session* session = TF_NewStatus();
TF_SessionOptions* options = TF_NewSessionOptions();
session = TF_NewSession(options, status);
@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 14, 2016

This works perfectly (both C++ and C API) if I build libtensorflow.so with bazel (instead of using contrib/makefile) and then link to bazel-bin/tensorflow/libtensorflow.so.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 15, 2016

This reproduces also on Ubuntu 16.04.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 15, 2016

This also happens on Raspberry Pi (Raspbian Jessie), but #3309 didn't occur there.

@Zappau

This comment has been minimized.

Copy link

Zappau commented Jul 15, 2016

Reproduced on Mac OS X (10.11) with current master
Only when using contrib/makefile

@jmchen-g

This comment has been minimized.

Copy link
Contributor

jmchen-g commented Jul 18, 2016

Could you provide the details when you get the errors on Ubuntu 16?

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 18, 2016

This occurs exactly the same on Ubuntu 16.04 64-bit with GCC 5.3.1. With exactly the same commands and code it will give this error in runtime:

E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "" config: } Registered factories are {}.
@jmchen-g

This comment has been minimized.

Copy link
Contributor

jmchen-g commented Jul 18, 2016

@martinwicke Could you take a look at this please?

@petewarden

This comment has been minimized.

Copy link
Member

petewarden commented Jul 19, 2016

This is usually a sign that the global constructors that TensorFlow uses to register things like session factories and kernels have been stripped out. The short answer is that you should make sure you build with the right linker options to stop them being stripped. On Linux you will add -Wl,--allow-multiple-definition -Wl,--whole-archive, and on OS X -all_load. The benchmark binary works because it does this, and it should pick the right combination for your platform.

The longer explanation is that TensorFlow uses a C++ pattern like this for registering classes (all in pseudo-code):

--- register.h ---
...
template<T> Register<T> {
 public:
    Register<T>(string name) {
      g_registry[name] = T::Factory();
    }
};
...
---                   ---

--- some_class.cc ---
...
Register<SomeObject> g_some_register_object("SomeObject");
---                         ---

The idea is that the g_some_register_object global will be created when the library is loaded, which will call the constructor, which adds the factory function to the list of registered classes. That allows subsequent code to ask for a "SomeClass" by name from the registry, and get back an object created by the factory function.

The advantage of this approach is that the registration of objects is distributed, so you only have to register a class in the file that it's implemented rather than editing a global list of factories somewhere else. When it works, it's pretty magical.

Unfortunately many linkers see the g_some_register_object global, notice that no other code ever reads or writes it, and so it can be removed without affecting the program at all. What they don't realize is that its constructor has an important side-effect, registering the factory function.

This is a common problem, so most linkers have some way of turning off this stripping optimization, but it's usually a pretty indiscriminate switch and so may cause your binary size to be larger than it needs to be.

Does that help?

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 20, 2016

Thanks. Adding -Wl,--allow-multiple-definition -Wl,--whole-archive did fix the "No session factory registered" issue, but now I get another one:

Failed to evaluate TensorFlow: Invalid argument: No OpKernel was registered to support Op 'Inv' with these attrs
[[Node: dropout/Inv = Inv[T=DT_FLOAT](keep_prob)]]

Note that the exact same code is working with bazel, but fails with contrib/makefile.

I think this might still be related to this issue since the new error mentions No OpKernel was registered.

In addition, while -Wl,--whole-archive fixes the first error, it is suboptimal. I have several small binaries that are all using libtensorflow. This linker flag increases the binary size for each of them by around 180 MB. It is huge compared to the minimal size I get without the flag.

@petewarden

This comment has been minimized.

Copy link
Member

petewarden commented Jul 20, 2016

No OpKernel was registered to support Op 'Inv'

This is actually a different error. If you look at tensorflow/contrib/makefile/tf_op_files.txt you'll see a list of the kernel files that are included by default, which doesn't include the tensorflow/core/kernels/cwise_op_inverse.cc which defines the Inv kernel.

This is actually by design, since we have picked a minimal subset of kernels that we expect to be used for inference, skipping those that are likely to only be used for training, since inference is the focus of our mobile build. That's not always an easy thing to estimate however, and here we've missed one that you need.

The short-term answer is that you can make local changes to tf_op_files.txt to add cwise_op_inverse.cc to fix your immediate problem (and do something similar with any other kernels you're missing). The better solution is to document what ops are supported more effectively, and offer an option to have a full set of ops too, instead of the current subset.

This linker flag increases the binary size for each of them by around 180 MB. It is huge compared to the minimal size I get without the flag.

Is this still true if you supply -Os to the whole build process? We tend to see a smaller increase outside of debug builds. You're right though, as I mention above this is a pretty indiscriminate switch and it would be good to figure out a nicer solution.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 21, 2016

Is this still true if you supply -Os to the whole build process? We tend to see a smaller increase outside of debug builds.

Seems that -Os does decrease the binary size to 112 MB, but it's still like 100 times larger than it could be.

@Ape

This comment has been minimized.

Copy link
Author

Ape commented Jul 21, 2016

Thanks for the help. I needed to include all these:

tensorflow/core/kernels/cwise_op_floor.cc
tensorflow/core/kernels/cwise_op_inverse.cc
tensorflow/core/kernels/random_op.cc

Compiling and running my programs work now. You might want to add documentation and research if there are better solutions than requiring --whole-archive.

@petewarden

This comment has been minimized.

Copy link
Member

petewarden commented Sep 28, 2016

Thanks for the update. Since this seems resolved, closing this issue.

@petewarden petewarden closed this Sep 28, 2016

@waleedka

This comment has been minimized.

Copy link
Contributor

waleedka commented Nov 14, 2016

I'm having the same issue. Compiling on Ubuntu 16.04 using bazel 0.4.0 and the master branch as of today. I tried the linker options mentioned above (-Wl,--allow-multiple-definition -Wl,--whole-archive) but that didn't help. Still getting the same error message: No session factory registered for the given session options.

My BUILD file is:

cc_binary(
    name = "label_image",
    srcs = [
        "main.cc",
    ],
    copts= ["-Iexternal/org_tensorflow"],
    linkopts = ["-lm", "-Wl,--allow-multiple-definition", "-Wl,--whole-archive"],
    deps = [
        "@org_tensorflow//tensorflow/cc:cc_ops",
    ],
)
@corwinliu

This comment has been minimized.

Copy link

corwinliu commented Sep 14, 2017

@waleedka
I have the same issue. Compiling on Ubuntu 16.04 using bazel 0.5.4. And I tried linker options but didn't work neither. Have you finally solved this issue?

@theSparta

This comment has been minimized.

Copy link

theSparta commented Oct 9, 2017

I am able to use the static c++ tensorflow library in a standalone C++ program but not able to use the static library in a qmake project to use that standalone program in other files! I am encountering the error No session factory registered for the given session options on running the successfully compiled program using qmake. I am using the linker options for -Wl,--allow-multiple-definition, -Wl,--whole-archive but to no effect. Also, if it helps, I am able to use the static library in a standalone program without using these linker options.

@jart

This comment has been minimized.

Copy link
Contributor

jart commented Oct 14, 2017

@corwinliu @waleedka Chances are you guys needed //tensorflow/core:direct_session on the deps list. But if you're building C++ programs with Bazel, it actually gets a little bit more complicated than that. Here's an example PNG -> JPG converter program built by Bazel to explain: https://gist.github.com/jart/c8b84a17ffa9e8e5dfa30342b8feca7a It's statically linked and ends up being about 22MB with no dependencies (11MB stripped.) Although we can make that much lower in the future, once we do some code cleanup on the alwayslink=1 targets.

@corwinliu

This comment has been minimized.

Copy link

corwinliu commented Oct 20, 2017

@jart Thanks a lot. I found that the tensorflow serving project(https://github.com/tensorflow/serving) is a good example to learn how to build C++ program including the tensorflow headers.

@scm-ns

This comment has been minimized.

Copy link

scm-ns commented Feb 9, 2019

Use ldd on your binary to figure out if the lib dependencies are being added.
This changes based on the linux flavor you are using, but some versions of ld, will not link a library unless its is used.
In the tf case, tensorflow_cc was not being linked to my binary, even though I specified -l tensorflow_cc.
The solution was to add -Wl,--no-as-needed flag to gcc. GCC processes flags left to right, so specify this flag before you add -ltensorflow*

ldd <binary_name> | grep tensorflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.