C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149

rajha-korithrien · 2018-03-31T18:09:53Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes see a very short example below.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
macOS 10.13.3 clang 900.0.39.2 and CentOS Linux 7 gcc-4.8.5
TensorFlow installed from (source or binary):
Source from the 1.7.0 release tag
TensorFlow version (use command below):
I have not actually installed the python pip package, but the source tree came from:
https://github.com/tensorflow/tensorflow/archive/v1.7.0.tar.gz
Python version:
N/A using the C++ API
Bazel version (if compiling from source):
macOS Build label: 0.11.1-homebrew and Centos Linux 7 Build label: 0.11.1- (@non-git)
GCC/Compiler version (if compiling from source):
macOS clang 900.0.39.2 and CentOS Linux 7 gcc-4.8.5
CUDA/cuDNN version:
N/A
GPU model and memory:
N/A
Exact command to reproduce:
extract the sources/configure
tar -xzvf v1.7.0.tar.gz
cd tensorflow-1.7.0
./configure

Then add the following directory to hold the work:
mkdir tensorflow/basic-example

Put into basic-example the following BUILD file:

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(
    name = "basic-example",
    srcs = [
        "basic-example.cc",
    ],
    deps = [
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow"
    ]
)

Put into basic-example the following C++ source file:

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

using namespace tensorflow;
using namespace tensorflow::ops;
using namespace std;

int main() {

  Scope scope = Scope::NewRootScope();
 
  auto c = Const(scope.WithOpName("const_c"), {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0}, {3,3});

  auto v = Variable(scope.WithOpName("var1"), {3, 3}, DT_FLOAT);
  auto init_v = Assign(scope.WithOpName("init_v"), v, c);

  std::vector<Tensor> outputs;
  ClientSession session(scope);

  TF_CHECK_OK(session.Run({init_v}, &outputs));
}

Now compile and run the resulting program:
bazel build -c dbg //tensorflow/basic-example
./bazel-bin/tensorflow/basic-example/basic-example

Observe the following behavior:

./bazel-bin/tensorflow/basic-example/basic-example
2018-03-31 11:47:57.135532: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
Segmentation fault: 11

Describe the problem

The code given above causes a segfault when the session runner tries to get the name of a node because the node is nullptr. I have included a stacktrace using lldb below (a trace showing the same information can be created using gdb on Linux).

However the following slightly modified C++ program works fine:

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

using namespace tensorflow;
using namespace tensorflow::ops;
using namespace std;

int main() {

  std::vector<float> initConstData = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0};

  Scope scope = Scope::NewRootScope();

  Tensor initConstT(DT_FLOAT, TensorShape({3,3}));
  std::copy_n(initConstData.begin(), initConstData.size(), initConstT.flat<float>().data());

  auto c = Const(scope.WithOpName("const_c"), initConstT);

  auto v = Variable(scope.WithOpName("var1"), {3, 3}, DT_FLOAT);
  auto init_v = Assign(scope.WithOpName("init_v"), v, c);

  std::vector<Tensor> outputs;
  ClientSession session(scope);

  TF_CHECK_OK(session.Run({init_v}, &outputs));
}

The difference between the code that works and the code that doesn't:
a) the explicit creation of a tensor initConstT
b) calling Const with a Tensor rather than an Input::Initializer

The behavior is identical if I omit the use of scope.WithOpName and just pass scope.
I have been able to test this back as far as Tensorflow 1.4 I can not build Tensorflow 1.3.1 with my installed version of bazel.

If I have done something wrong, please point it out. Otherwise I feel that because there is no semantic difference between the two programs and the API allows the former program to compile then they should both work.

Source code / logs

Stacktrace of the problem:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x60)
  * frame #0: 0x0000000126e677bc libtensorflow_framework.so`tensorflow::Node::name() const [inlined] std::__1::shared_ptr<tensorflow::NodeProperties>::operator->(this=0x0000000000000060) const at memory:4071
    frame #1: 0x0000000126e677bc libtensorflow_framework.so`tensorflow::Node::name(this=0x0000000000000000) const at graph.cc:140
    frame #2: 0x000000010018592f basic-example`tensorflow::Output::name(this=0x000000012bc020f0) const at ops.h:76
    frame #3: 0x0000000100184e7a basic-example`tensorflow::ClientSession::Run(this=0x00007ffeefbff4a8, run_options=0x00007ffeefbfefd0, inputs=size=0, fetch_outputs=size=1, run_outputs=size=1, outputs=0x00007ffeefbff4b0 size=1, run_metadata=0x0000000000000000) const at client_session.cc:118
    frame #4: 0x0000000100184145 basic-example`tensorflow::ClientSession::Run(this=0x00007ffeefbff4a8, inputs=size=0, fetch_outputs=size=1, run_outputs=size=1, outputs=0x00007ffeefbff4b0 size=1) const at client_session.cc:89
    frame #5: 0x000000010018408a basic-example`tensorflow::ClientSession::Run(this=0x00007ffeefbff4a8, fetch_outputs=size=1, outputs=0x00007ffeefbff4b0 size=1) const at client_session.cc:76
    frame #6: 0x0000000100002bfc basic-example`main at basic-example.cc:22
    frame #7: 0x00007fff76249115 libdyld.dylib`start + 1
    frame #8: 0x00007fff76249115 libdyld.dylib`start + 1
(lldb)

The text was updated successfully, but these errors were encountered:

davidscherer · 2018-06-07T17:39:48Z

If, before the call to session.Run(), you do something like this:

    if (!scope.ok()) {
        LOG(FATAL) << scope.status().ToString();
        abort();
    }

then you will get a more helpful error message:

Invalid argument: Inconsistent values for attr 'T' DT_FLOAT vs. DT_DOUBLE while building NodeDef 'Model/init_v' using Op<name=Assign; signature=ref:Ref(T), value:T -> output_ref:Ref(T); attr=T:type; attr=validate_shape:bool,default=true; attr=use_locking:bool,default=true; allows_uninitialized_input=true>

But frankly I am more concerned about the SIGSEGV and lack of diagnostics. What I have discovered in trying to use the Tensorflow C++ API is that as soon as you construct an operation with a shape or type error, scope.ok() becomes false. Any subsequent operations added to the graph bail out of their constructors immediately, leaving them with node()==nullptr. Then any call to Run() using these operations results in a segfault, as does Output::name() and probably other things that depend on a valid node.

If it's intended that the user always check explicitly for scope errors before calling Run(), on penalty of undefined behavior, it seems to me that that should be reflected in the documentation and examples for the C++ API. For example, in the first example at https://www.tensorflow.org/api_guides/cc/guide if you modify the matrix A to not be rectangular (like auto A = Const(root, { {3.f, 2.f}, {-1.f} });) you will get a segfault rather than the error message "Invalid argument: Initializer list components should all have the same shape".

Ideally I think calling Run() in this scenario should result in an error status rather than a segfault! Surely it would be simple to check session.ok() there. I hesitate to offer architectural advice on a project I'm so new to, but it might be better not to initialize Operations to having a NULL node at all.

rajha-korithrien · 2018-06-22T19:37:57Z

@davidscherer Thanks for your explanation! My apologies for my late response. With your added information I know what is going on now.

std::vector<float> initConstData = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0};

Forces C++ to hold the numbers as floats rather than doubles. So when the following is executed:

Tensor initConstT(DT_FLOAT, TensorShape({3,3}));
std::copy_n(initConstData.begin(), initConstData.size(), initConstT.flat<float>().data());
auto c = Const(scope.WithOpName("const_c"), initConstT);

The underlying data held in initConstT is float data. However when we use the implicit initialization code:

auto c = Const(scope.WithOpName("const_c"), {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0}, {3,3});

The complier creates the initializer array as doubles instead of floats, which in turn means that 'c' holds double data. This causes a problem here:

auto v = Variable(scope.WithOpName("var1"), {3, 3}, DT_FLOAT);
auto init_v = Assign(scope.WithOpName("init_v"), v, c);

Because the tensorflow variable 'v' is defined to hold DT_FLOAT, but 'c' is holding DT_DOUBLE due to the implicit initializer array.

Thank you for your help in figuring it out!

I agree completely that the error was caused and known at the time of creating the init_v operation and the C++ API could raise an exception then and there much like the Java API does.
If the API designers do not want to use the exception feature of C++, then I also agree with your approach of having session.Run() check the scope object for the user, rather than allowing a segfault occur.

koriavinash1 · 2019-01-29T09:41:04Z

can anyone share an example for weight initialization in multi-layered networks?

tensorflowbutler assigned michaelisard Apr 1, 2018

michaelisard assigned skye and unassigned michaelisard Apr 10, 2018

skye added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 18, 2018

mohantym self-assigned this Jan 14, 2023

mohantym added comp:runtime c++ runtime, performance issues (cpu) 1.9 type:bug Bug labels Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149

C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149

rajha-korithrien commented Mar 31, 2018

davidscherer commented Jun 7, 2018 •

edited

rajha-korithrien commented Jun 22, 2018 •

edited

koriavinash1 commented Jan 29, 2019

C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149

C++ Const and Assign to initialize variable causes a segfault depending on the Const constructor used #18149

Comments

rajha-korithrien commented Mar 31, 2018

System information

Describe the problem

Source code / logs

davidscherer commented Jun 7, 2018 • edited

rajha-korithrien commented Jun 22, 2018 • edited

koriavinash1 commented Jan 29, 2019

davidscherer commented Jun 7, 2018 •

edited

rajha-korithrien commented Jun 22, 2018 •

edited