Weird crash when using tensorflow C++ API on Android #22884

robinqhuang · 2018-10-11T03:46:49Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):

Follow the link https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f to load graph with C++ API:

namespace tf = tensorflow;
Inference::loadGraph( const std::vector& graph )
{

tf::SessionOptions* options = new tf::SessionOptions();
tensorflow::Session* session = nullptr;
tf::Status status = tf::NewSession( *options, &session );
if ( status.ok() ) {
    m_session.reset( session );
} else {
    return false;
}

tf::GraphDef tensorflowGraph;

// The following graph parsing codes are copied from ReadBinaryProto in core/platform/env.cc
std::unique_ptr<::tensorflow::protobuf::io::CodedInputStream> coded_stream =
        std::unique_ptr<::tensorflow::protobuf::io::CodedInputStream>(
            new ::tensorflow::protobuf::io::CodedInputStream(
                    (const google::protobuf::uint8*)graph.data(),
                    (int)graph.size()));

if (!coded_stream.get()) {
    return false;
}

// Total bytes hard limit / warning limit are set to 1GB and 512MB
// respectively.
coded_stream->SetTotalBytesLimit(1024LL << 20, 512LL << 20);

if (!tensorflowGraph.ParseFromCodedStream(coded_stream.get())) {
    return false;
}

status = m_session->Create( tensorflowGraph );
if (!status.ok()) {
    return false;
}

#ifdef IOS
// Notes: Crash in SessionOptions destructor on Android for unknown reason.
// All of data member get corrupt on Android in destructor
// Use stack variable tf::SessionOptions options crash as well.
// The object may be deleted already by others on Android
// But no crash on iOS.
delete options;
#endif

return true;

}

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 18.04 or Mac OS
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
Samsung S9 or HuaWei P20.
TensorFlow installed from (source or binary):
Tensorflow mobile, build for Android from source with makefile, following official build steps.

./tensorflow/contrib/makefile/build_all_android.sh -a armeabi-v7a -s tensorflow/contrib/makefile/sub_makefiles/android/Makefile.in -t "libtensorflow_inference.so libtensorflow_demo.so all"

TensorFlow version (use command below):
r1.10
Python version:
3.5
Bazel version (if compiling from source):
1.5.0. Not used because makefile is used
GCC/Compiler version (if compiling from source):
Android NDK r15c
CUDA/cuDNN version:
Not enabled
GPU model and memory:
Exact command to reproduce:

Describe the problem

I created a set C++ APIs for iOS and Android with JNI wrapper on top of it. Not using Tensorflow Java API since I want the same C++ API shared between iOS and Android.

Everything works on iOS.

As comments mentioned in the code above, the SessionOptions can't be deleted on Android. It causes crash. If the options object not deleted, load graph works on Android as well. Please let me know what I did wrong. Thanks

The text was updated successfully, but these errors were encountered:

robinqhuang · 2018-10-11T21:24:08Z

Interesting, whenever create a tensorflow::SessionOptions, it crashes in its destructor on Android.
I can reproduce the crash with the following simple code.
bool
Inference::loadGraph( const std::vector& graph ) {
tensorflow::SessionOptions options;
return true;
}

jdduke · 2018-10-15T20:56:22Z

Does the crash stack trace reveal anything? Does it repro with both debug and optimized builds?

robinqhuang · 2018-10-17T00:54:56Z

raw crash stack:

2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #5 pc 000cc505 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZNSsD2Ev+60)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #6 pc 00091431 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZN10tensorflow14SessionOptionsD2Ev+24)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #7 pc 00091453 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (ZNKSt14default_deleteIN10tensorflow14SessionOptionsEEclEPS1+18)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #8 pc 00090583 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZNSt10unique_ptrIN10te

the decoded crash stack:

termediates/cmake/debug/obj/armeabi-v7a/libml.so 000cc505
__gnu_cxx::new_allocator::deallocate(char*, unsigned int)
/usr/local/google/buildbot/src/android/ndk-r15-release/out/build/tmp/build-170085/build-gnustl/static-armeabi-v7aj4.9/build/include/ext/new_allocator.h:116

====

It seems crash when SessionOptions's target get destructed.

robinqhuang · 2018-10-17T01:01:29Z

BTW, I was trying to build libtensorflow_cc.so, but I can't build it. Please see the last comment in #12747

I also tried building libtensorflow_inference.so with bazel, but my application has a lot of link errors: missing symbol errors, for example, SessionOptions class missing.

So finally, I built android with build_all_android.sh/build_all_ios.sh script as mentioned above (it works well on iOS):
./tensorflow/contrib/makefile/build_all_android.sh -a armeabi-v7a -s tensorflow/contrib/makefile/sub_makefiles/android/Makefile.in -t "libtensorflow_inference.so libtensorflow_demo.so all"

Please let me know what I need to change in makefile to build optimized and debug.

jdduke · 2018-11-02T20:43:40Z

@andrehentz have you seen anything like this before?

jdduke · 2018-12-04T19:20:12Z

Does this still repro with the latest 1.12 release?

robinqhuang · 2018-12-06T01:17:07Z

Yes. it is reproducible on 1.12 release

jdduke · 2019-01-11T17:07:07Z

The CMake-based build path for TensorFlow Mobile is a best effort project, and it's sadly unsurprising that it's out-of-sync with the proper source tree. I would strongly encourage you to try the bazel-based build path again. I was able to build successfully using NDK r15c:

bazel build --config=android_arm -c opt --cxxopt=--std=c++11 \
  tensorflow/contrib/android:libtensorflow_inference.so

The relevant bazelrc section after running ./configure from the root checkout directory:

build --action_env ANDROID_NDK_HOME="$ANDROID_CODE_DIR/android-ndk-r15c"
build --action_env ANDROID_NDK_API_LEVEL="19"
build --action_env ANDROID_BUILD_TOOLS_VERSION="26.0.1"
build --action_env ANDROID_SDK_API_LEVEL="23"
build --action_env ANDROID_SDK_HOME="$ANDROID_CODE_DIR/android-sdk-linux"

robinqhuang · 2019-01-11T19:22:46Z

android_arm64 build failed which we need to support

Also it seems protobuf is not in the generated .so file

undefined reference to 'google::protobuf::io::CodedInputStream::SetTotalBytesLimit

jdduke · 2019-01-11T20:15:03Z

What is the arm64 failure you're seeing? There's a known issue with certain NDK versions and arm64 builds where you need to change the NDK API version. See also issue #20192. In particular, if you change ANDROID_NDK_API_LEVEL in .tf_configure.bazelrc to 21, then --config=android_arm64 ought to build. Can you give that a try?

robinqhuang · 2019-01-11T23:20:47Z

tensorflow build for arm64 now.

but I still have link issue for my app: undefined reference to 'google::protobuf::io::CodedInputStream::SetTotalBytesLimit.

Please see my source code at beginning.

BTW,
(1)how to build with cmake , any working branch is ok. I would like to give a try.

(2) --config=android_x86 is not supported ?

(3) I know tf mobile will be deprecated. but tflite support rnn already ?

jdduke · 2019-01-14T21:16:47Z

(1)how to build with cmake , any working branch is ok. I would like to give a try.

At the moment, we don't have concrete plans to maintain CMake support indefinitely. As stated before, this has been best-effort. Of course, we'd be happy to accept PRs that fix the CMake build issue.

(2) --config=android_x86 is not supported ?

For TensorFlow Lite, yes. For TensorFlow Mobile, again, the situation isn't quite as clear. What issues are you seeing? Does it fail to compile?

(3) I know tf mobile will be deprecated. but tflite support rnn already ?

We are working on control flow and general RNN support. If there's a specific model you'd like to support, let us know, and we can be sure to prioritize it.

robinqhuang · 2019-01-15T17:41:49Z

It seems missing APIs in bazel build as proposed. We have to stay for Makefile solution.

I am using bidirectional_dynamic_rnn, tf.contrib.rnn.LSTMCell and tf.contrib.seq2seq. Hope all of them are available in tf.lite.

jdduke · 2019-02-07T23:39:57Z

I am using bidirectional_dynamic_rnn, tf.contrib.rnn.LSTMCell and tf.contrib.seq2seq. Hope all of them are available in tf.lite.

Yep, this is being tracked separately. For now, there's not a lot of support we can give to the CMake build, but feel free to make a pull request if you see issues in the bazel build.

tensorflowbutler assigned Harshini-Gadige Oct 11, 2018

Harshini-Gadige assigned ymodak and unassigned Harshini-Gadige Oct 11, 2018

ymodak added the comp:lite TF Lite related issues label Oct 11, 2018

ymodak assigned rockyrhodes and unassigned ymodak Oct 11, 2018

rockyrhodes assigned jdduke and unassigned rockyrhodes Oct 15, 2018

jdduke added the stat:awaiting response Status - Awaiting response from author label Oct 15, 2018

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 17, 2018

jdduke assigned andrehentz Nov 2, 2018

jdduke added the stat:awaiting response Status - Awaiting response from author label Dec 4, 2018

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 6, 2018

jdduke added the stat:awaiting response Status - Awaiting response from author label Jan 11, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jan 12, 2019

jdduke closed this as completed Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird crash when using tensorflow C++ API on Android #22884

Weird crash when using tensorflow C++ API on Android #22884

robinqhuang commented Oct 11, 2018

robinqhuang commented Oct 11, 2018

jdduke commented Oct 15, 2018

robinqhuang commented Oct 17, 2018 •

edited

robinqhuang commented Oct 17, 2018 •

edited

jdduke commented Nov 2, 2018

jdduke commented Dec 4, 2018

robinqhuang commented Dec 6, 2018

jdduke commented Jan 11, 2019

robinqhuang commented Jan 11, 2019 •

edited

jdduke commented Jan 11, 2019

robinqhuang commented Jan 11, 2019

jdduke commented Jan 14, 2019

robinqhuang commented Jan 15, 2019

jdduke commented Feb 7, 2019

Weird crash when using tensorflow C++ API on Android #22884

Weird crash when using tensorflow C++ API on Android #22884

Comments

robinqhuang commented Oct 11, 2018

System information

Describe the problem

robinqhuang commented Oct 11, 2018

jdduke commented Oct 15, 2018

robinqhuang commented Oct 17, 2018 • edited

robinqhuang commented Oct 17, 2018 • edited

jdduke commented Nov 2, 2018

jdduke commented Dec 4, 2018

robinqhuang commented Dec 6, 2018

jdduke commented Jan 11, 2019

robinqhuang commented Jan 11, 2019 • edited

jdduke commented Jan 11, 2019

robinqhuang commented Jan 11, 2019

jdduke commented Jan 14, 2019

robinqhuang commented Jan 15, 2019

jdduke commented Feb 7, 2019

robinqhuang commented Oct 17, 2018 •

edited

robinqhuang commented Oct 17, 2018 •

edited

robinqhuang commented Jan 11, 2019 •

edited