Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird crash when using tensorflow C++ API on Android #22884

Closed
robinqhuang opened this issue Oct 11, 2018 · 14 comments
Closed

Weird crash when using tensorflow C++ API on Android #22884

robinqhuang opened this issue Oct 11, 2018 · 14 comments
Assignees
Labels
comp:lite TF Lite related issues

Comments

@robinqhuang
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):

Follow the link https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f to load graph with C++ API:

namespace tf = tensorflow;
Inference::loadGraph( const std::vector& graph )
{

tf::SessionOptions* options = new tf::SessionOptions();
tensorflow::Session* session = nullptr;
tf::Status status = tf::NewSession( *options, &session );
if ( status.ok() ) {
    m_session.reset( session );
} else {
    return false;
}

tf::GraphDef tensorflowGraph;

// The following graph parsing codes are copied from ReadBinaryProto in core/platform/env.cc
std::unique_ptr<::tensorflow::protobuf::io::CodedInputStream> coded_stream =
        std::unique_ptr<::tensorflow::protobuf::io::CodedInputStream>(
            new ::tensorflow::protobuf::io::CodedInputStream(
                    (const google::protobuf::uint8*)graph.data(),
                    (int)graph.size()));

if (!coded_stream.get()) {
    return false;
}

// Total bytes hard limit / warning limit are set to 1GB and 512MB
// respectively.
coded_stream->SetTotalBytesLimit(1024LL << 20, 512LL << 20);

if (!tensorflowGraph.ParseFromCodedStream(coded_stream.get())) {
    return false;
}

status = m_session->Create( tensorflowGraph );
if (!status.ok()) {
    return false;
}

#ifdef IOS
// Notes: Crash in SessionOptions destructor on Android for unknown reason.
// All of data member get corrupt on Android in destructor
// Use stack variable tf::SessionOptions options crash as well.
// The object may be deleted already by others on Android
// But no crash on iOS.
delete options;
#endif

return true;

}

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Ubuntu 18.04 or Mac OS

  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
    Samsung S9 or HuaWei P20.

  • TensorFlow installed from (source or binary):
    Tensorflow mobile, build for Android from source with makefile, following official build steps.

./tensorflow/contrib/makefile/build_all_android.sh -a armeabi-v7a -s tensorflow/contrib/makefile/sub_makefiles/android/Makefile.in -t "libtensorflow_inference.so libtensorflow_demo.so all"

  • TensorFlow version (use command below):
    r1.10

  • Python version:
    3.5

  • Bazel version (if compiling from source):
    1.5.0. Not used because makefile is used

  • GCC/Compiler version (if compiling from source):
    Android NDK r15c

  • CUDA/cuDNN version:
    Not enabled

  • GPU model and memory:

  • Exact command to reproduce:

Describe the problem

I created a set C++ APIs for iOS and Android with JNI wrapper on top of it. Not using Tensorflow Java API since I want the same C++ API shared between iOS and Android.

Everything works on iOS.

As comments mentioned in the code above, the SessionOptions can't be deleted on Android. It causes crash. If the options object not deleted, load graph works on Android as well. Please let me know what I did wrong. Thanks

@ymodak ymodak added the comp:lite TF Lite related issues label Oct 11, 2018
@robinqhuang
Copy link
Author

Interesting, whenever create a tensorflow::SessionOptions, it crashes in its destructor on Android.
I can reproduce the crash with the following simple code.
bool
Inference::loadGraph( const std::vector& graph ) {
tensorflow::SessionOptions options;
return true;
}

@ymodak ymodak assigned rockyrhodes and unassigned ymodak Oct 11, 2018
@rockyrhodes rockyrhodes assigned jdduke and unassigned rockyrhodes Oct 15, 2018
@jdduke
Copy link
Member

jdduke commented Oct 15, 2018

Does the crash stack trace reveal anything? Does it repro with both debug and optimized builds?

@jdduke jdduke added the stat:awaiting response Status - Awaiting response from author label Oct 15, 2018
@robinqhuang
Copy link
Author

robinqhuang commented Oct 17, 2018

raw crash stack:

2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #5 pc 000cc505 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZNSsD2Ev+60)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #6 pc 00091431 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZN10tensorflow14SessionOptionsD2Ev+24)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #7 pc 00091453 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (ZNKSt14default_deleteIN10tensorflow14SessionOptionsEEclEPS1+18)
2018-10-16 17:48:36.604 29325-29325/? A/DEBUG: #8 pc 00090583 /data/app/com.ml.example-lg00mRAO0Q31eDZwmSEdGg==/lib/arm/libml.so (_ZNSt10unique_ptrIN10te

the decoded crash stack:

termediates/cmake/debug/obj/armeabi-v7a/libml.so 000cc505
__gnu_cxx::new_allocator::deallocate(char*, unsigned int)
/usr/local/google/buildbot/src/android/ndk-r15-release/out/build/tmp/build-170085/build-gnustl/static-armeabi-v7aj4.9/build/include/ext/new_allocator.h:116

====

It seems crash when SessionOptions's target get destructed.

@robinqhuang
Copy link
Author

robinqhuang commented Oct 17, 2018

BTW, I was trying to build libtensorflow_cc.so, but I can't build it. Please see the last comment in #12747

I also tried building libtensorflow_inference.so with bazel, but my application has a lot of link errors: missing symbol errors, for example, SessionOptions class missing.

So finally, I built android with build_all_android.sh/build_all_ios.sh script as mentioned above (it works well on iOS):
./tensorflow/contrib/makefile/build_all_android.sh -a armeabi-v7a -s tensorflow/contrib/makefile/sub_makefiles/android/Makefile.in -t "libtensorflow_inference.so libtensorflow_demo.so all"

Please let me know what I need to change in makefile to build optimized and debug.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 17, 2018
@jdduke
Copy link
Member

jdduke commented Nov 2, 2018

@andrehentz have you seen anything like this before?

@jdduke
Copy link
Member

jdduke commented Dec 4, 2018

Does this still repro with the latest 1.12 release?

@jdduke jdduke added the stat:awaiting response Status - Awaiting response from author label Dec 4, 2018
@robinqhuang
Copy link
Author

Yes. it is reproducible on 1.12 release

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 6, 2018
@jdduke
Copy link
Member

jdduke commented Jan 11, 2019

The CMake-based build path for TensorFlow Mobile is a best effort project, and it's sadly unsurprising that it's out-of-sync with the proper source tree. I would strongly encourage you to try the bazel-based build path again. I was able to build successfully using NDK r15c:

bazel build --config=android_arm -c opt --cxxopt=--std=c++11 \
  tensorflow/contrib/android:libtensorflow_inference.so 

The relevant bazelrc section after running ./configure from the root checkout directory:

build --action_env ANDROID_NDK_HOME="$ANDROID_CODE_DIR/android-ndk-r15c"
build --action_env ANDROID_NDK_API_LEVEL="19"
build --action_env ANDROID_BUILD_TOOLS_VERSION="26.0.1"
build --action_env ANDROID_SDK_API_LEVEL="23"
build --action_env ANDROID_SDK_HOME="$ANDROID_CODE_DIR/android-sdk-linux"

@jdduke jdduke added the stat:awaiting response Status - Awaiting response from author label Jan 11, 2019
@robinqhuang
Copy link
Author

robinqhuang commented Jan 11, 2019

android_arm64 build failed which we need to support

Also it seems protobuf is not in the generated .so file

undefined reference to 'google::protobuf::io::CodedInputStream::SetTotalBytesLimit

@jdduke
Copy link
Member

jdduke commented Jan 11, 2019

What is the arm64 failure you're seeing? There's a known issue with certain NDK versions and arm64 builds where you need to change the NDK API version. See also issue #20192. In particular, if you change ANDROID_NDK_API_LEVEL in .tf_configure.bazelrc to 21, then --config=android_arm64 ought to build. Can you give that a try?

@robinqhuang
Copy link
Author

tensorflow build for arm64 now.

but I still have link issue for my app: undefined reference to 'google::protobuf::io::CodedInputStream::SetTotalBytesLimit.

Please see my source code at beginning.

BTW,
(1)how to build with cmake , any working branch is ok. I would like to give a try.

(2) --config=android_x86 is not supported ?

(3) I know tf mobile will be deprecated. but tflite support rnn already ?

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jan 12, 2019
@jdduke
Copy link
Member

jdduke commented Jan 14, 2019

(1)how to build with cmake , any working branch is ok. I would like to give a try.

At the moment, we don't have concrete plans to maintain CMake support indefinitely. As stated before, this has been best-effort. Of course, we'd be happy to accept PRs that fix the CMake build issue.

(2) --config=android_x86 is not supported ?

For TensorFlow Lite, yes. For TensorFlow Mobile, again, the situation isn't quite as clear. What issues are you seeing? Does it fail to compile?

(3) I know tf mobile will be deprecated. but tflite support rnn already ?

We are working on control flow and general RNN support. If there's a specific model you'd like to support, let us know, and we can be sure to prioritize it.

@robinqhuang
Copy link
Author

It seems missing APIs in bazel build as proposed. We have to stay for Makefile solution.

I am using bidirectional_dynamic_rnn, tf.contrib.rnn.LSTMCell and tf.contrib.seq2seq. Hope all of them are available in tf.lite.

@jdduke
Copy link
Member

jdduke commented Feb 7, 2019

I am using bidirectional_dynamic_rnn, tf.contrib.rnn.LSTMCell and tf.contrib.seq2seq. Hope all of them are available in tf.lite.

Yep, this is being tracked separately. For now, there's not a lot of support we can give to the CMake build, but feel free to make a pull request if you see issues in the bazel build.

@jdduke jdduke closed this as completed Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues
Projects
None yet
Development

No branches or pull requests

7 participants