Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when using Coral Edge Tpu #62371

Closed
Skillnoob opened this issue Nov 10, 2023 · 31 comments
Closed

Segmentation Fault when using Coral Edge Tpu #62371

Skillnoob opened this issue Nov 10, 2023 · 31 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting response Status - Awaiting response from author TF2.14 For issues related to Tensorflow 2.14.x type:bug Bug

Comments

@Skillnoob
Copy link

Skillnoob commented Nov 10, 2023

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.14.0

Custom code

Yes

OS platform and distribution

Raspberry Pi Os Bookworm

Mobile device

No response

Python version

3.11.2

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

The program crashes as soon as it tries to load the edge tpu model.
I already asked in the YoloV8 discord and they told me to open a issue here from looking at the stacktrace.
The hardware is a raspberry pi 5 with 8Gb of ram but the same issue also happens on a raspberry pi 4B with 2Gb of ram

Standalone code to reproduce the issue

import cv2
from ultralytics import YOLO


def main():
    model = YOLO('model_edgetpu.tflite', task='detect') # has to be a YoloV8 model converted to tflite and compiled to the coral edge tpu

    camera = cv2.VideoCapture(0)

    while camera.isOpened():
        _, frame = camera.read()

        results = model.predict(frame)

        annotated_frame = results[0].plot()

        cv2.imshow("YOLOv8 Inference", annotated_frame)

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    camera.release()
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Relevant log output

Crash when running through gdb to get the stacktrace:

(gdb) run ai_model_copy.py run ai_model_copy.py 
Starting program: /usr/bin/python3 ai_model_copy.py run ai_model_copy.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5a4f180 (LWP 2502)]
[New Thread 0x7ffff523f180 (LWP 2503)]
[New Thread 0x7ffff0a2f180 (LWP 2504)]
[New Thread 0x7fffd9bef180 (LWP 2505)]
[New Thread 0x7fffd73df180 (LWP 2506)]
[New Thread 0x7fffd4bcf180 (LWP 2507)]
[Detaching after vfork from child process 2508]
[Detaching after vfork from child process 2509]
[Detaching after vfork from child process 2510]
[New Thread 0x7fff94aaf180 (LWP 2517)]
[New Thread 0x7fff9229f180 (LWP 2518)]
[New Thread 0x7fff91a8f180 (LWP 2519)]
Loading mode_edgetpu.tflite for TensorFlow Lite Edge TPU inference...
[New Thread 0x7fff89bef180 (LWP 2521)]
[Thread 0x7fff89bef180 (LWP 2521) exited]
[New Thread 0x7fff89bef180 (LWP 2522)]
[Thread 0x7fff89bef180 (LWP 2522) exited]
[New Thread 0x7fff89bef180 (LWP 2523)]
[New Thread 0x7fff893df180 (LWP 2524)]
[New Thread 0x7fff88bcf180 (LWP 2525)]
[Thread 0x7fff88bcf180 (LWP 2525) exited]
[Thread 0x7fff893df180 (LWP 2524) exited]
[New Thread 0x7fff893df180 (LWP 2526)]
[New Thread 0x7fff88bcf180 (LWP 2527)]
[New Thread 0x7fff83fff180 (LWP 2528)]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fff8bbee22c in tflite::Subgraph::ReplaceNodeSubsetsWithDelegateKernels(TfLiteRegistration, TfLiteIntArray const*, TfLiteDelegate*) ()
   from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so

Stacktrace made using gdb:

#0  0x00007fff8bbee22c in tflite::Subgraph::ReplaceNodeSubsetsWithDelegateKernels(TfLiteRegistration, TfLiteIntArray const*, TfLiteDelegate*) ()
   from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#1  0x00007fff8bbee1e0 in tflite::Subgraph::ReplaceNodeSubsetsWithDelegateKernels(TfLiteContext*, TfLiteRegistration, TfLiteIntArray const*, TfLiteDelegate*) ()
   from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#2  0x00007fff89c6be24 in ?? () from /lib/aarch64-linux-gnu/libedgetpu.so.1
#3  0x00007fff8bbf3058 in tflite::Subgraph::ModifyGraphWithDelegateImpl(TfLiteDelegate*) () from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#4  0x00007fff8bbf3624 in tflite::Subgraph::ModifyGraphWithDelegate(TfLiteDelegate*) () from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#5  0x00007fff8bbe5ce8 in tflite::impl::Interpreter::ModifyGraphWithDelegateImpl(TfLiteDelegate*) () from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#6  0x00007fff8b92af70 in tflite::interpreter_wrapper::InterpreterWrapper::ModifyGraphWithDelegate(TfLiteDelegate*) ()
   from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#7  0x00007fff8b9273a4 in pybind11::cpp_function::initialize<pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module_&)::$_23, pybind11::object, tflite::interpreter_wrapper::InterpreterWrapper&, unsigned long, pybind11::name, pybind11::is_method, pybind11::sibling, char [60]>(pybind11_init__pywrap_tensorflow_interpreter_wrapper(pybind11::module_&)::$_23&&, pybind11::object (*)(tflite::interpreter_wrapper::InterpreterWrapper&, unsigned long), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [60])::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) ()
   from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#8  0x00007fff8b91772c in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/pi/.local/lib/python3.11/site-packages/tensorflow/lite/python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
#9  0x00000000004c9d5c in ?? ()
#10 0x0000000000494548 in _PyObject_MakeTpCall ()
#11 0x00000000004aa23c in _PyEval_EvalFrameDefault ()
#12 0x00000000004e2cec in _PyFunction_Vectorcall ()
#13 0x000000000049c1d8 in _PyObject_FastCallDictTstate ()
#14 0x00000000004edc24 in ?? ()
#15 0x00000000004944d8 in _PyObject_MakeTpCall ()
#16 0x00000000004aa23c in _PyEval_EvalFrameDefault ()
#17 0x00000000004e2cec in _PyFunction_Vectorcall ()
#18 0x00000000004f3390 in PyObject_Call ()
#19 0x00000000004ae388 in _PyEval_EvalFrameDefault ()
#20 0x00000000004e2cec in _PyFunction_Vectorcall ()
#21 0x000000000049c1d8 in _PyObject_FastCallDictTstate ()
#22 0x00000000004edc24 in ?? ()
#23 0x00000000004944d8 in _PyObject_MakeTpCall ()
#24 0x00000000004aa23c in _PyEval_EvalFrameDefault ()
#25 0x00000000004a0b60 in PyEval_EvalCode ()
#26 0x00000000005fafa8 in ?? ()
#27 0x00000000005f7bd0 in ?? ()
#28 0x0000000000608760 in ?? ()
#29 0x0000000000608308 in _PyRun_SimpleFileObject ()
#30 0x0000000000608070 in _PyRun_AnyFileObject ()
#31 0x000000000060631c in Py_RunMain ()
#32 0x00000000005d0154 in Py_BytesMain ()
#33 0x00007ffff7ce7780 in __libc_start_call_main (main=main@entry=0x5cfff4 <_start+52>, argc=argc@entry=4, argv=argv@entry=0x7ffffffff0f8) at ../sysdeps/nptl/libc_start_call_main.h:58
#34 0x00007ffff7ce7858 in __libc_start_main_impl (main=0x5cfff4 <_start+52>, argc=4, argv=0x7ffffffff0f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:360
#35 0x00000000005cfff0 in _start ()
@google-ml-butler google-ml-butler bot added the type:bug Bug label Nov 10, 2023
@sushreebarsa sushreebarsa added comp:lite TF Lite related issues TF2.14 For issues related to Tensorflow 2.14.x labels Nov 11, 2023
@sushreebarsa sushreebarsa assigned pjpratik and unassigned pjpratik Nov 14, 2023
@sushreebarsa
Copy link
Contributor

@Skillnoob Could you please make sure that you are using the correct TensorFlow Lite model and interpreter for your Edge TPU device?
The Edge TPU hardware and firmware needs to be up to date to avoid such issues.
Thank you!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Nov 14, 2023
@Skillnoob
Copy link
Author

@sushreebarsa The model has been converted to tflite with int8 and compiled using the latest version of the edgetpu compiler. The edgetpu runtime is also on the latest version. The ultralytics module uses the tflite interpreter from the tensorflow module iirc.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Nov 14, 2023
@LakshmiKalaKadali
Copy link
Contributor

@Skillnoob , Could you please provide the model_edgetpu.tflite file to better understand the issue and to investigate further?

Thank You

@LakshmiKalaKadali LakshmiKalaKadali added the stat:awaiting response Status - Awaiting response from author label Nov 17, 2023
@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Nov 17, 2023
@LakshmiKalaKadali
Copy link
Contributor

LakshmiKalaKadali commented Nov 21, 2023

@Skillnoob, I have verified the Tflite file given. It seems that your model is using custom ops. Please find the gist. Could you please check these instructions(documentation) for custom ops and let us know if they have been followed?

Thank You

@LakshmiKalaKadali LakshmiKalaKadali added the stat:awaiting response Status - Awaiting response from author label Nov 21, 2023
@Skillnoob
Copy link
Author

@LakshmiKalaKadali the error you got inside the notebook is to be expected as the model relies on the delegate from the coral usb accelerator

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Nov 21, 2023
@LakshmiKalaKadali
Copy link
Contributor

Hi @pkgoogle ,
Please look into the issue.

Thank you

@pkgoogle
Copy link

Hi @Skillnoob, it seems like there is a dependency on some installation instructions from Coral? Can you let us know the Coral instructions you followed prior to receiving this error? Also please note that the website explicitly states:

Python 3.6 - 3.9

As a requirement.

Apparently you are using: 3.11.2, you might want to try using python=3.9 and see if it resolves your issue. Thanks for your help.

@pkgoogle pkgoogle added the stat:awaiting response Status - Awaiting response from author label Nov 27, 2023
@Skillnoob
Copy link
Author

Skillnoob commented Nov 27, 2023

@pkgoogle Hi. I've tried the same program previously on a raspberry pi 4B with python 3.9.2 and had the same crash but couldn't run it through gdb as the pi would just freeze. I also tried it on the pi 5 using a anaconda virtual env with python 3.9.18 and the same crash occured.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Nov 27, 2023
@Skillnoob
Copy link
Author

Recently tried to run this using tensorflow-aarch64 2.13.1 and 2.15.0 and the crash still occurs

@Skillnoob
Copy link
Author

easier way of reproducing the issue:

pip install ultralytics

yolo export model=yolov8n.pt format=edgetpu

Install the edge tpu runtime as explained here

yolo detect val model=yolov8n_saved_model/yolov8n_full_integer_quant_edgetpu.tflite data=coco128.yaml batch=1

@Skillnoob
Copy link
Author

also tried this with every tensorflow version going from the current latest to 2.12.1 on a raspberry pi 4B with python 3.9.2

@Skillnoob
Copy link
Author

This is now resolved due to https://github.com/feranick updating the libedgetpu runtime to support newer tflite_runtime versions in google-coral/edgetpu#812 .

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@feranick
Copy link
Contributor

I would still keep this issue open. While there is a solution from my forked repository, Google should update the main one, and until then the issue is not solved.

@Skillnoob Skillnoob reopened this Jan 30, 2024
@Skillnoob
Copy link
Author

@feranick i agree. But it seems that google has completly abandoned the coral project.

@pkgoogle
Copy link

pkgoogle commented Jan 30, 2024

Hi @feranick I don't work with that repo but can you perhaps link your PR that fixes the issue? Maybe I can help push it from here.

Edit: nevermind.. found it. I think this is it? google-coral/libedgetpu#59

@Skillnoob
Copy link
Author

@pkgoogle thats the correct pr

@feranick
Copy link
Contributor

New builds are finally available against Tensorflow v2.15.0 (current). I had to refactor the WORKSPACE to conform to the deprecation of the TF/Toolchain. All seem to be working for me (including on armhf where it only worked with tflite_runtime v2.13.1).

I plan to do a new PR soon.

https://github.com/feranick/libedgetpu/releases/tag/v16.0-TF2.15.0-1

@feranick
Copy link
Contributor

Hi @feranick I don't work with that repo but can you perhaps link your PR that fixes the issue? Maybe I can help push it from here.

Edit: nevermind.. found it. I think this is it? google-coral/libedgetpu#59

Thanks. That is the PR, but I plan to make a new one based on the last few commits hat bring support to TensorFlow 2.15.0. I'll post here.

@feranick
Copy link
Contributor

feranick commented Jan 31, 2024

BTW, in case someone needs updated tflite_runtime wheels, I prepared a few here.

https://github.com/feranick/TFlite-builds/releases/tag/v2.15.0

@cyclux
Copy link

cyclux commented Feb 27, 2024

@feranick Thanks a lot! Finally after a long journey in the rabbit hole, this is the only solution to get working with 3.11 currently

@feranick
Copy link
Contributor

feranick commented Feb 27, 2024

Hi @feranick I don't work with that repo but can you perhaps link your PR that fixes the issue? Maybe I can help push it from here.

Edit: nevermind.. found it. I think this is it? google-coral/libedgetpu#59

pkgoogle Just wondering whether you had a chance to move this forward. The correct and current PR is this:

google-coral/libedgetpu#60

@feranick
Copy link
Contributor

Thanks very much to @pkgoogle and @Namburger at Google for merging PR. The libedgetpu library is now fully updated, and I hope binaries will be made available soon through the official channel.

@pkgoogle
Copy link

Hi @Skillnoob, as @feranick mentioned, the libedgetpu library is now updated. Can you test your case against master and see if it resolves your issue?

@pkgoogle pkgoogle added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Feb 29, 2024
@Skillnoob
Copy link
Author

@pkgoogle I have already tested with @feranick 's builds a while ago and they work without any issues on my rpi 5 with python 3.11

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Feb 29, 2024
@pkgoogle
Copy link

@Skillnoob, awesome, if you are confident this issue is resolved and you have no more open items, please feel free to close this issue as completed. Thanks.

@pkgoogle pkgoogle added the stat:awaiting response Status - Awaiting response from author label Feb 29, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting response Status - Awaiting response from author TF2.14 For issues related to Tensorflow 2.14.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

8 participants