New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149
Comments
If, before the call to session.Run(), you do something like this:
then you will get a more helpful error message:
But frankly I am more concerned about the SIGSEGV and lack of diagnostics. What I have discovered in trying to use the Tensorflow C++ API is that as soon as you construct an operation with a shape or type error, scope.ok() becomes false. Any subsequent operations added to the graph bail out of their constructors immediately, leaving them with node()==nullptr. Then any call to Run() using these operations results in a segfault, as does Output::name() and probably other things that depend on a valid node. If it's intended that the user always check explicitly for scope errors before calling Run(), on penalty of undefined behavior, it seems to me that that should be reflected in the documentation and examples for the C++ API. For example, in the first example at https://www.tensorflow.org/api_guides/cc/guide if you modify the matrix A to not be rectangular (like Ideally I think calling Run() in this scenario should result in an error status rather than a segfault! Surely it would be simple to check session.ok() there. I hesitate to offer architectural advice on a project I'm so new to, but it might be better not to initialize Operations to having a NULL node at all. |
@davidscherer Thanks for your explanation! My apologies for my late response. With your added information I know what is going on now. std::vector<float> initConstData = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0}; Forces C++ to hold the numbers as floats rather than doubles. So when the following is executed: Tensor initConstT(DT_FLOAT, TensorShape({3,3}));
std::copy_n(initConstData.begin(), initConstData.size(), initConstT.flat<float>().data());
auto c = Const(scope.WithOpName("const_c"), initConstT); The underlying data held in initConstT is float data. However when we use the implicit initialization code: auto c = Const(scope.WithOpName("const_c"), {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0}, {3,3}); The complier creates the initializer array as doubles instead of floats, which in turn means that 'c' holds double data. This causes a problem here: auto v = Variable(scope.WithOpName("var1"), {3, 3}, DT_FLOAT);
auto init_v = Assign(scope.WithOpName("init_v"), v, c); Because the tensorflow variable 'v' is defined to hold DT_FLOAT, but 'c' is holding DT_DOUBLE due to the implicit initializer array. Thank you for your help in figuring it out! I agree completely that the error was caused and known at the time of creating the init_v operation and the C++ API could raise an exception then and there much like the Java API does. |
can anyone share an example for weight initialization in multi-layered networks? |
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes see a very short example below.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
macOS 10.13.3 clang 900.0.39.2 and CentOS Linux 7 gcc-4.8.5
TensorFlow installed from (source or binary):
Source from the 1.7.0 release tag
TensorFlow version (use command below):
I have not actually installed the python pip package, but the source tree came from:
https://github.com/tensorflow/tensorflow/archive/v1.7.0.tar.gz
Python version:
N/A using the C++ API
Bazel version (if compiling from source):
macOS Build label: 0.11.1-homebrew and Centos Linux 7 Build label: 0.11.1- (@non-git)
GCC/Compiler version (if compiling from source):
macOS clang 900.0.39.2 and CentOS Linux 7 gcc-4.8.5
CUDA/cuDNN version:
N/A
GPU model and memory:
N/A
Exact command to reproduce:
extract the sources/configure
tar -xzvf v1.7.0.tar.gz
cd tensorflow-1.7.0
./configure
Then add the following directory to hold the work:
mkdir tensorflow/basic-example
Put into basic-example the following BUILD file:
Put into basic-example the following C++ source file:
Now compile and run the resulting program:
bazel build -c dbg //tensorflow/basic-example
./bazel-bin/tensorflow/basic-example/basic-example
Observe the following behavior:
Describe the problem
The code given above causes a segfault when the session runner tries to get the name of a node because the node is nullptr. I have included a stacktrace using lldb below (a trace showing the same information can be created using gdb on Linux).
However the following slightly modified C++ program works fine:
The difference between the code that works and the code that doesn't:
a) the explicit creation of a tensor initConstT
b) calling Const with a Tensor rather than an Input::Initializer
The behavior is identical if I omit the use of scope.WithOpName and just pass scope.
I have been able to test this back as far as Tensorflow 1.4 I can not build Tensorflow 1.3.1 with my installed version of bazel.
If I have done something wrong, please point it out. Otherwise I feel that because there is no semantic difference between the two programs and the API allows the former program to compile then they should both work.
Source code / logs
Stacktrace of the problem:
The text was updated successfully, but these errors were encountered: