Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in FlexDelegate on android #38025

Closed
particlebbq opened this issue Mar 29, 2020 · 22 comments
Closed

segfault in FlexDelegate on android #38025

particlebbq opened this issue Mar 29, 2020 · 22 comments
Assignees
Labels
comp:lite TF Lite related issues type:bug Bug

Comments

@particlebbq
Copy link

I'm hoping to run a custom tensorflow/tflite model (one that uses tflite's select ops) on-device in an android app. My understanding is that I need to configure the tflite interpreter with a FlexDelegate, but when I try to do this (on the android-studio emulator), the app segfaults, apparently in the FlexDelegate constructor. I've managed to reproduce the crash in a minimal code, which I link to and describe below.

Thanks in advance for any help on this, and thanks also to all the devs for creating tensorflow!

System information

  • Have I written custom code (as opposed to using a stock
    example script provided in TensorFlow): Only a little. I've added a call to the FlexDelegate constructor in the MainActivity of the default flutter app that android studio generates when you tell it to start a new project.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): the crash I'm seeing happens in an android phone emulator, but the box the emulator is running on is running gentoo linux.
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if
    the issue happens on mobile device: the android emulator that android-studio provides (I've tested a few configurations including api 27, 29, and R as well as x86 and x86_64 abis)
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below):
    in app/build.gradle,
    implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
    implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'

Describe the current behavior
the app crashes while attempting to construct a FlexDelegate instance while running in the emulator. (I actually don't have a physical device handy, so I can't test to see if it happens on real hardware right now.)

Describe the expected behavior
FlexDelegate should be created with no segfault

Standalone code to reproduce the issue
The line that crashes is

FlexDelegate delegate = new FlexDelegate();

which I've added to the configureFlutterEngine method of the app's MainActivity. I've put the code for the full example app in this repository:

https://github.com/particlebbq/tflite_bug_report

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

The error message in the logcat is:
2020-03-29 16:22:01.392 11042-11042/com.example.tflitebugreport A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffffff4 in tid 11042 (tflitebugreport), pid 11042 (tflitebugreport)

@particlebbq particlebbq added the type:bug Bug label Mar 29, 2020
@amahendrakar amahendrakar added the comp:lite TF Lite related issues label Mar 30, 2020
@amahendrakar amahendrakar assigned ymodak and unassigned amahendrakar Mar 30, 2020
@ymodak ymodak assigned jdduke and unassigned ymodak Mar 30, 2020
@jdduke
Copy link
Member

jdduke commented Mar 31, 2020

The first thing to check is that you're using the latest nightly builds. Can you try manually clearing your gradle cache or change

implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'

to

implementation('org.tensorflow:tensorflow-lite:0.0.0-nightly') { changing = true }
implementation('org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly') { changing = true }

?

@particlebbq
Copy link
Author

particlebbq commented Mar 31, 2020

Ok, I've tried switching to the "changing=true" syntax and rerunning, and after that I tried clearing the gradle cache (by removing ~/.gradle/caches) and I reran again. In both cases, the segfault was still there.

@jdduke
Copy link
Member

jdduke commented Mar 31, 2020

I see, thanks for checking. It's possible this is an emulator-specific issue. We'll take a look.

@particlebbq
Copy link
Author

Ok, thanks! If there are any other checks I can do that would be helpful, I'm happy to give them a shot; just let me know.

@jdduke
Copy link
Member

jdduke commented Mar 31, 2020

If you can attach the logcat preceding the seg fault, that might be useful.

@particlebbq
Copy link
Author

Ok, the logcat from the last run I did follows.

2020-03-31 14:31:28.646 18654-18654/? I/tflitebugrepor: Not late-enabling -Xcheck:jni (already on)
2020-03-31 14:31:28.659 18654-18654/? I/tflitebugrepor: Unquickening 13 vdex files!
2020-03-31 14:31:28.660 18654-18654/? W/tflitebugrepor: Unexpected CPU variant for X86 using defaults: x86
2020-03-31 14:31:28.894 18654-18654/com.example.tflitebugreport I/tflitebugrepor: The ClassLoaderContext is a special shared library.
2020-03-31 14:31:28.959 18654-18675/com.example.tflitebugreport I/ResourceExtractor: Found extracted resources res_timestamp-1-1585678399005
2020-03-31 14:31:29.007 18654-18678/com.example.tflitebugreport D/libEGL: loaded /vendor/lib/egl/libEGL_emulation.so
2020-03-31 14:31:29.008 18654-18678/com.example.tflitebugreport D/libEGL: loaded /vendor/lib/egl/libGLESv1_CM_emulation.so
2020-03-31 14:31:29.009 18654-18678/com.example.tflitebugreport D/libEGL: loaded /vendor/lib/egl/libGLESv2_emulation.so
2020-03-31 14:31:29.035 18654-18654/com.example.tflitebugreport D/HostConnection: HostConnection::get() New Host Connection established 0xe9c48910, tid 18654
2020-03-31 14:31:29.046 18654-18654/com.example.tflitebugreport D/HostConnection: HostComposition ext ANDROID_EMU_CHECKSUM_HELPER_v1 ANDROID_EMU_native_sync_v2 ANDROID_EMU_native_sync_v3 ANDROID_EMU_native_sync_v4 ANDROID_EMU_dma_v1 ANDROID_EMU_direct_mem ANDROID_EMU_host_composition_v1 ANDROID_EMU_host_composition_v2 ANDROID_EMU_vulkan ANDROID_EMU_deferred_vulkan_commands ANDROID_EMU_vulkan_null_optional_strings ANDROID_EMU_vulkan_create_resources_with_requirements ANDROID_EMU_YUV_Cache ANDROID_EMU_async_unmap_buffer ANDROID_EMU_vulkan_ignored_handles GL_OES_vertex_array_object GL_KHR_texture_compression_astc_ldr ANDROID_EMU_gles_max_version_2
2020-03-31 14:31:29.061 18654-18654/com.example.tflitebugreport D/EGL_emulation: eglCreateContext: 0xe9bd3280: maj 2 min 0 rcv 2
2020-03-31 14:31:29.143 18654-18681/com.example.tflitebugreport D/HostConnection: HostConnection::get() New Host Connection established 0xe9c48c80, tid 18681
2020-03-31 14:31:29.145 18654-18681/com.example.tflitebugreport D/HostConnection: HostComposition ext ANDROID_EMU_CHECKSUM_HELPER_v1 ANDROID_EMU_native_sync_v2 ANDROID_EMU_native_sync_v3 ANDROID_EMU_native_sync_v4 ANDROID_EMU_dma_v1 ANDROID_EMU_direct_mem ANDROID_EMU_host_composition_v1 ANDROID_EMU_host_composition_v2 ANDROID_EMU_vulkan ANDROID_EMU_deferred_vulkan_commands ANDROID_EMU_vulkan_null_optional_strings ANDROID_EMU_vulkan_create_resources_with_requirements ANDROID_EMU_YUV_Cache ANDROID_EMU_async_unmap_buffer ANDROID_EMU_vulkan_ignored_handles GL_OES_vertex_array_object GL_KHR_texture_compression_astc_ldr ANDROID_EMU_gles_max_version_2
2020-03-31 14:31:29.226 18654-18681/com.example.tflitebugreport D/EGL_emulation: eglMakeCurrent: 0xe9bd3280: ver 2 0 (tinfo 0xe7a8ea00)
2020-03-31 14:31:29.328 18654-18687/com.example.tflitebugreport I/flutter: Observatory listening on http://127.0.0.1:34829/x57mhTHSbR4=/
2020-03-31 14:31:29.448 18654-18654/com.example.tflitebugreport I/MainActivity: Made it here!
2020-03-31 14:31:29.498 18654-18654/com.example.tflitebugreport W/native: cpu_feature_guard.cc:36 The TensorFlow library was compiled to use SSE instructions, but these aren't available on your machine.
2020-03-31 14:31:29.498 18654-18654/com.example.tflitebugreport A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffffff4 in tid 18654 (tflitebugreport), pid 18654 (tflitebugreport)

@jdduke
Copy link
Member

jdduke commented Mar 31, 2020

2020-03-31 14:31:29.498 18654-18654/com.example.tflitebugreport W/native: cpu_feature_guard.cc:36 The TensorFlow library was compiled to use SSE instructions, but these aren't available on your machine.

That looks potentially suspicious, though it doesn't immediately address the seg fault. It's very helpful though, we'll try to get back to you soon.

@particlebbq
Copy link
Author

Thanks! I agree that the "SSE instructions" message does look suspicious.

One more observation, in case it helps: I notice that if I change the FlexDelegate to an NnApiDelegate, then I don't see a segfault, and I don't get the 'SSE instructions' message either.

@abattery
Copy link
Contributor

abattery commented Apr 2, 2020

I can reproduce the problem in my side as well. Will take a look for finding a root cause.

@terryheo
Copy link
Member

FYI, you can disable using SSE instructions by providing --copt="-mno-sse4" to the bazel build command.

@terryheo terryheo self-assigned this Apr 28, 2020
@fuzhenxin
Copy link

fuzhenxin commented Jun 26, 2020

I met the same problem:

A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffffff4 in tid xxx (xxx), pid xxx (xxx)  

After build tensorflow without sse4.1 and sse4.2, it works!
My compiled tesroflow is available here https://github.com/fuzhenxin/Tensorflow-Lite-Select-TF-Ops-AAR

@OscarVanL
Copy link
Contributor

Is there any possibility this issue could be solved natively in tensorflow-lite? I'm also experiencing it when running an Android emulator.

@terryheo
Copy link
Member

Internal build script is updated.
I've verified with today's org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly release.
Please let me know if you still have the issue. Thanks!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@particlebbq
Copy link
Author

Looks like this works on my setup -- many thanks to all who helped!

@OscarVanL
Copy link
Contributor

This also worked for me.

If you're not familiar with gradle like myself, because the dependency version numbers are the same for every nightly build, you need to refresh your dependencies to get the latest nightly release.

Click Gradle (right-hand side of Android Studio), then Execute Gradle Task (the elephant icon), and enter gradle build --refresh-dependencies. This will re-download all your dependencies.

@OscarVanL
Copy link
Contributor

Has this issue returned for anyone else? I've done the above process to re-download dependencies, but my app has started crashing with the same segfault errors again.

build.gradle:

    implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
    implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'
    implementation 'org.tensorflow:tensorflow-lite-support:0.0.0-nightly'

Crash:

W/native: cpu_feature_guard.cc:36 The TensorFlow library was compiled to use SSE instructions, but these aren't available on your machine.
A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffffff4 in tid 32685 (init-21), pid 32643 (gdp19.truevoice)

@abattery
Copy link
Contributor

abattery commented Dec 8, 2020

@terryheo could you take a look at #38025 (comment) ?

@terryheo
Copy link
Member

terryheo commented Dec 8, 2020

I can verified the issue. Let me dig and prepare a fix.

@terryheo terryheo reopened this Dec 8, 2020
copybara-service bot pushed a commit that referenced this issue Dec 9, 2020
Avoid using std::cerr.
This patch resolves GitHub issue #38025.

PiperOrigin-RevId: 346452855
Change-Id: Ife2504476b265909814c09c900cbb5090a55fcc5
@terryheo
Copy link
Member

The change was merged a week ago. But using nightly is still crashing since there is an issue of updating JCenter.
But you can use nightly directly from the Cloud Storage.
https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/google3/ubuntu_16/lite/nightly/511/20201214-223707/tensorflow-lite-select-tf-ops.aar

@tensorflowbutler
Copy link
Member

Hi There,

We are checking to see if you still need help on this issue, as you are using an older version of tensorflow(1.x) which is officially considered as end of life. We recommend that you upgrade to 2.4 or later version and let us know if the issue still persists in newer versions.

This issue will be closed automatically 7 days from now. If you still need help with this issue, Please open a new issue for any help you need against 2.x, and we will get you the right help.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants