Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emacs: Fatal error 11: Segmentation fault #2599

Closed
Grimler91 opened this issue Jul 1, 2018 · 40 comments · Fixed by #4909
Closed

Emacs: Fatal error 11: Segmentation fault #2599

Grimler91 opened this issue Jul 1, 2018 · 40 comments · Fixed by #4909
Labels
bug report Something is not working properly

Comments

@Grimler91
Copy link
Member

This happens on arm, android 7. At least 3 other people experienced the same according to a google+ post. The crashes aren't very frequent and I haven't found an easy way to reproduce them.

I have experienced this since around 4cba233 but building the previous version (25c6980) doesn't change anything.

I've build a debug version and investigated with gdb and valgrind though.
The debug deb for emacs and all dependencies are available from https://grimler.se/dists/testing/debug.

gdb shows:

Program received signal SIGSEGV, Segmentation fault.
                                                      0xb69dd240 in sigsetjmp () from /system/lib/libc.so
(gdb) bt
#0  0xb69dd240 in sigsetjmp () from /system/lib/libc.so
#1  0xdd0e51c8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

While valgrind reports lots and lots of Conditional jump or move depends on uninitialised value(s) and Use of uninitialised value of size 4, a snippet from running valgrind --leak-check=full --track-origins=yes -v might look like:

==8250== Conditional jump or move depends on uninitialised value(s)
==8250==    at 0x1AF8E2: mark_maybe_object (alloc.c:4705)
==8250==    by 0x1AFD69: mark_memory (alloc.c:4895)
==8250==    by 0x1AFD69: mark_stack (alloc.c:5038)
==8250==    by 0x1AFD69: garbage_collect_1 (alloc.c:5760)
==8250==    by 0x1AFD69: Fgarbage_collect (alloc.c:5983)
==8250==    by 0x1BF401: eval_sub (eval.c:2169)
==8250==    by 0x1BF69D: Fprogn (eval.c:431)
==8250==    by 0x1BF8AF: funcall_lambda (eval.c:2922)
==8250==    by 0x1BFBBF: Ffuncall (eval.c:2760)
==8250==    by 0x1BFC55: funcall_nil (eval.c:2338)
==8250==    by 0x1BEC59: run_hook_with_args (eval.c:2515)
==8250==    by 0x1BEC9D: Frun_hook_with_args (eval.c:2380)
==8250==    by 0x1BFA79: Ffuncall (eval.c:2679)
==8250==    by 0x1E2CC3: exec_byte_code (bytecode.c:880)
==8250==    by 0x1BF785: funcall_lambda (eval.c:2863)
==8250==  Uninitialised value was created by a stack allocation
==8250==    at 0x1AFA38: Fgarbage_collect (alloc.c:5929)
==8250==
==8250== Use of uninitialised value of size 4
==8250==    at 0x1AF8E4: mark_maybe_object (alloc.c:4705)
==8250==    by 0x1AFD69: mark_memory (alloc.c:4895)
==8250==    by 0x1AFD69: mark_stack (alloc.c:5038)
==8250==    by 0x1AFD69: garbage_collect_1 (alloc.c:5760)
==8250==    by 0x1AFD69: Fgarbage_collect (alloc.c:5983)
==8250==    by 0x1BF401: eval_sub (eval.c:2169)
==8250==    by 0x1BF69D: Fprogn (eval.c:431)
==8250==    by 0x1BF8AF: funcall_lambda (eval.c:2922)
==8250==    by 0x1BFBBF: Ffuncall (eval.c:2760)
==8250==    by 0x1BFC55: funcall_nil (eval.c:2338)
==8250==    by 0x1BEC59: run_hook_with_args (eval.c:2515)
==8250==    by 0x1BEC9D: Frun_hook_with_args (eval.c:2380)
==8250==    by 0x1BFA79: Ffuncall (eval.c:2679)
==8250==    by 0x1E2CC3: exec_byte_code (bytecode.c:880)
==8250==    by 0x1BF785: funcall_lambda (eval.c:2863)
==8250==  Uninitialised value was created by a stack allocation
==8250==    at 0x1AFA38: Fgarbage_collect (alloc.c:5929)

Exact lines and functions varies though, another log had

==7058== Conditional jump or move depends on uninitialised value(s)
==7058==    at 0x1BF5D2: mem_find (alloc.c:4212)
==7058==    by 0x1C18A7: mark_maybe_pointer (alloc.c:4889)
==7058==    by 0x1BF40F: mark_memory (alloc.c:4985)
==7058==    by 0x214E4F: mark_one_thread (thread.c:616)
==7058==    by 0x2146E9: mark_threads_callback (thread.c:649)
==7058==    by 0x1BFE15: garbage_collect_1 (alloc.c:6001)
==7058==    by 0x1D2CAD: eval_sub (eval.c:2231)
==7058==    by 0x1D2E83: Fprogn (eval.c:455)
==7058==    by 0x1D591B: funcall_lambda (eval.c:3045)
==7058==    by 0x1D50B1: Ffuncall (eval.c:2784)
==7058==    by 0x1D52FB: funcall_nil (eval.c:2400)
==7058==    by 0x1D52D9: run_hook_with_args (eval.c:2576)
==7058==  Uninitialised value was created by a stack allocation
==7058==    at 0x1D2798: eval_sub (eval.c:2113)
==7058==
==7058== Use of uninitialised value of size 4
==7058==    at 0x1BF5C4: mem_find (alloc.c:4210)
==7058==    by 0x1C18A7: mark_maybe_pointer (alloc.c:4889)
==7058==    by 0x1BF40F: mark_memory (alloc.c:4985)
==7058==    by 0x214E4F: mark_one_thread (thread.c:616)
==7058==    by 0x2146E9: mark_threads_callback (thread.c:649)
==7058==    by 0x1BFE15: garbage_collect_1 (alloc.c:6001)
==7058==    by 0x1D2CAD: eval_sub (eval.c:2231)
==7058==    by 0x1D2E83: Fprogn (eval.c:455)
==7058==    by 0x1D591B: funcall_lambda (eval.c:3045)
==7058==    by 0x1D50B1: Ffuncall (eval.c:2784)
==7058==    by 0x1D52FB: funcall_nil (eval.c:2400)
==7058==    by 0x1D52D9: run_hook_with_args (eval.c:2576)
==7058==  Uninitialised value was created by a stack allocation
==7058==    at 0x1D2798: eval_sub (eval.c:2113)

Two example full valgrind logs are available at https://grimler.se/emacs_segfault2 and https://grimler.se/emacs_segfault3.

Running emacs under valgrind on my aarch64 give similar problems but I haven't experienced a segfault. I assume therefore that we have a problem in our emacs on all arches.

I'm not sure how to debug this further.

@Grimler91 Grimler91 added the bug report Something is not working properly label Jul 1, 2018
@mindbound
Copy link

Same here, arm, Android 6.

@jeromezero
Copy link

I have experienced the same issue. Nvidia shield tablet. Android 7 now, consistent segfaults since 6.
I don't think they happened at all when I was back on lolipop ...

@Factavi
Copy link

Factavi commented Aug 1, 2018

I am getting a lot of emacs seg faults especially with touch-scrolling. (Android 7)

@Grimler91
Copy link
Member Author

@Factavi Hm, I mostly use a hardware keyboard so haven't noticed if it happens often when touch-scrolling.

For me it seem to happen more or less randomly.

@abbaxi
Copy link

abbaxi commented Oct 7, 2018

This happens frequently to me. I'm using the ARM architecture, bluetooth keyboard, most current version of emacs in termux's repository, as of 15:00 MST, OCT 6th.

Emacs is primarily what I use termux for, this is an ongoing, massive source of frustration.
Let me know if I can be more helpful,
-abbaxi

@kanreki
Copy link

kanreki commented Nov 23, 2018

This happens for me too: HP Chromebook 13 G1

Normally I live inside Emacs all the time (including shell window). But this is a deal-breaker. I've resorted to learning Vim, and you know how horrifying that is.

Been happening since I first installed Termux, approx. May, 2017. I can't say how often, because this has caused me to mostly abandon Emacs.

@kanreki
Copy link

kanreki commented Nov 23, 2018

Sorry, the HP Chromebook 13 G1 uses an Intel Core m7, so I guess I should not have piled on to this ARM bug report.

@jeromezero
Copy link

jeromezero commented Nov 23, 2018 via email

@sjuswede
Copy link

sjuswede commented Feb 7, 2019

Happens to me frequently as well, on a Samsung Tab A (2016). Really annoying, as I pretty much got the tablet for Termux and Emacs.

If there is any way I can help, please let me know!

@oscarfv
Copy link

oscarfv commented Mar 22, 2019

I think that Emacs on termux was built with CANNOT_DUMP, which is mostly untested. The development version of Emacs (branch master on the git repo) has a new portable dumper. It would be interesting to try it and see if the segfault goes away.

@ferfebles
Copy link

Same problem here with a Samsung Tab A (2016). If I start emacs with "emacs -q" it seems to be stable. But if I load the init.el file, it gives a segmentation fault after a few seconds of use.

Trying to reduce de init.el file... But I'm loosing so much of Emacs that I would prefer to use nano.

@oscarfv
Copy link

oscarfv commented Mar 27, 2019

For me it segfaults with -Q too. With or without init file, in my experience the time it takes to segfault varies: from a few seconds to 20 minutes or more.

@ferfebles
Copy link

You're right @oscarfv, it segfaults with 'emacs -q' too. It took a bit longer but it gave me the error after scrolling down and up about 20 times the '.emacs' file.

I tested emacs with the new Android 7 branch (in alpha testing) but it gave me the same error.
I don't have the knowledge to try to build emacs without CANNOT_DUMP. If someone finds any solution please post it here.

@oscarfv
Copy link

oscarfv commented Apr 16, 2019

Tried emacs 27 with the new portable dumper. Same crash.

@oscarfv
Copy link

oscarfv commented Apr 16, 2019

A fast method for triggering the crash here is to visit some file with a few hundred lines and keep pressed the cursor-down key. Usually it will crash on reaching the bottom edge of the window after scrolling a few screenfulls of text.

@ghost ghost mentioned this issue Nov 2, 2019
@ghost ghost changed the title Emacs: Fatal error 11: Segmentation fault on arm Emacs: Fatal error 11: Segmentation fault Nov 2, 2019
@Grimler91
Copy link
Member Author

So, I've tried the following since last year (without solving the problem):

  • Building emacs on device (with different configurations)
  • Cross-compiling with emacs_cv_func_sigsetjmp=no (this introduced some other problems but didn't fix the segfault)
  • Doing the hostbuild of emacs with -m32 (this solved a problem for nethack, but would have been weird if this worked since a local build segfaults)

Since gdb cannot tell where the crash comes from, even if emacs and all dependencies are built with debug symbols, I am guessing the problem arises in a /system/lib library. This does not mean that the bug is not solvable, just that it is harder to pinpoint the actual problem and solve it.

I also do not think I have encountered the segfault when using emacs --daemon + emacsclient, but I am not using the daemon a lot

@snogglethorpe
Copy link

snogglethorpe commented Nov 7, 2019 via email

@oscarfv
Copy link

oscarfv commented Nov 7, 2019

@snogglethorpe : If Emacs qualifies as a large package for you, on April I compiled Emacs 27 on a cheap, generic quad-core tablet without problems.

@Grimler91
Copy link
Member Author

@snogglethorpe your device will (most likely) shut down if the cpu reaches a too high temperature, it's not really anything you need to be afraid of

@snogglethorpe
Copy link

Ok, I've built Emacs master on my CB, and it seems to work properly (with a little futzing around to update the Termux patches), dumping and all, and I can run "emacs -fg-daemon" in gdb...

Unfortunately, it's so far stubbornly refused to crash...! ^^;

[Which doesn't mean much, sometimes the distro emacs runs for many many hours before crashing...]

@snogglethorpe
Copy link

snogglethorpe commented Nov 27, 2019

Ok, now it's crashing as expected, in the same maddeningly hard to debug way the emacs package does.

BTW, one thing is much better about using this build than the current Termux emacs package: it's dumped, so restarting emacs after a crash is much faster, essentially instantaneous, whereas the non-dumped emacs in the Termux package takes 3-4 seconds to start.

Would it be possible to use dumping for the Termux emacs package?

@ghost
Copy link

ghost commented Nov 27, 2019

Would it be possible to use dumping for the Termux emacs package?

No. See https://github.com/termux/termux-packages/blob/master/packages/emacs/build.sh#L62.

@oscarfv
Copy link

oscarfv commented Nov 27, 2019

@snogglethorpe : dumping Emacs 27 works thanks to the new portable dumper. Termux packages the latest Emacs stable release, i. e. 26.

@Grimler91
Copy link
Member Author

@oscarfv omg, finally a dumped emacs in termux, thanks for the tip!

And I haven't been able to make it crash in my initial testing (scrolling ~10k lines), I'll leave this issue open until emacs-27 is packaged, and will close it if I haven't been able to reproduce a crash by then!

@snogglethorpe
Copy link

snogglethorpe commented Nov 30, 2019

I've been using dumping in my local build (built in Termux) of the emacs trunk (so version 27.x), and it works fine—and starts up much faster than an un-dumped emacs.

However it does still exhibit the random hard-to-debug crashiness of the current Termux emacs package (which doesn't use dumping). As with that version of emacs, it's pretty random—sometimes it goes for ages without crashing, sometimes it crashed almost immediately after startup.

@oscarfv
Copy link

oscarfv commented Nov 30, 2019

@Grimler91 : even if it works for you the bug should remain open because, first, Emacs 27 is not the current release and, second, the crash also happens with Emacs 27.

When I tried Emacs 27 my experience matched what @snogglethorpe describes.

@kanreki
Copy link

kanreki commented Feb 9, 2020 via email

@zettelmuseum
Copy link
Contributor

zettelmuseum commented Feb 9, 2020

Here is a way to reliably reproduce this crash.
Vanilla emacs from termux apt, no configs, no packages, no .emacs.

Step 1 (create our test file, may take a few seconds)

for i in {1..200}; do echo "* $i"; echo "** $i"; for j in {1..1000}; do echo "bla"; done;  done > test123.org
emacs --file test123.org

Now; it is important to do Step 2 very quickly, especially holding the down key after the return key. If it doesn't crash the first time, just repeat Step 2 a few more times. Always crashes for me after a few tries.
EDIT: it may take up to 10 tries. also, zoom your termux display to 20 rows.

Step 2

C-x C-v RET
*immediately* after RET press the down key 
and HOLD IT until crash or end of file. 
(I'm using Hacker's keyboard)

Can anyone also reproduce it this way?

@zettelmuseum
Copy link
Contributor

@kanreki
can you try again with new method?

@hindux
Copy link

hindux commented Feb 9, 2020

@zettelmuseum , no it's not crashes

@zettelmuseum
Copy link
Contributor

@krishna-arch
you may need to repeat step 2 up to 10 times and do it quick
never took more than 10 tries here

@Grimler91
Copy link
Member Author

@zettelmuseum , no it's not crashes

Crash seem to only happen on arm, and mostly (only?) on samsung devices. Maybe @krishna-arch has another type of device.

@zettelmuseum I can reproduce the crash with your testfile. Can't tell if it crashes faster compared to just scrolling in a "normal" file, but it crashes nonetheless.

@zettelmuseum
Copy link
Contributor

also it seems to crash faster if you zoom termux display (max. 20 rows)

@zettelmuseum
Copy link
Contributor

@Grimler91 by the way, not a Samsung device here, just a generic arm phablet

@Grimler91
Copy link
Member Author

by the way, not a Samsung device here, just a generic arm phablet

Good to know, thanks for the info!

On another note: I've noticed that compiling emacs with --enable-checking='yes,glyphs' (as suggested in the DEBUG notes) gives a make error:

[...]
Loading /data/data/com.termux/files/home/projects/emacs/lisp/emacs-lisp/syntax.el (source)...
Loading /data/data/com.termux/files/home/projects/emacs/lisp/font-lock.el (source)...
Loading /data/data/com.termux/files/home/projects/emacs/lisp/jit-lock.el (source)...

../../src/fns.c:2856: Emacs fatal error: assertion failed: !FIXNUM_OVERFLOW_P (lisp_h_make_fixnum_n)
Fatal error 6: n
make[1]: *** [Makefile:817: bootstrap-emacs.pdmp] Aborted
make[1]: Leaving directory '/data/data/com.termux/files/home/projects/emacs/build/src'
make: *** [Makefile:424: src] Error 2

Might be related to this bug, or maybe it's just a problem on the emacs-27 branch. I'll ask for advice on the emacs mailing list.

@zettelmuseum
Copy link
Contributor

zettelmuseum commented Feb 9, 2020

Here's another interesting observation.
Using the method described above, I ran Step 2 2x30 times, with and without TMUX.
Without tmux: 8 crashes, 8/30 runs crashed, 27%.
With tmux: 0 crashes, 0/30 runs crashed, 0%.

Too early to tell what this means.., but I'm going to run emacs inside tmux from now on :-)

EDIT: this is using the exact same termux package, not proot or anything.

apt install tmux
tmux
emacs

@Grimler91
Copy link
Member Author

I have merged a potential fix for this: 996c569, it should be available in a few minutes.
If anyone still gets segfaults after upgrading emacs to 26.3-5, please let me know.

@snogglethorpe
Copy link

That change only addresses arm, but the same crash happens on x86 ....

@Grimler91
Copy link
Member Author

@snogglethorpe thanks, fixed in c6fe679. 26.3-6 should be available in a couple of minutes

@kanreki
Copy link

kanreki commented Feb 15, 2020

So far, so good! Thanks!! I will certainly continue using this, and will report back if anything is amiss.

ianrabt pushed a commit to ianrabt/termux-packages that referenced this issue Jul 2, 2020
@ghost ghost locked and limited conversation to collaborators Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug report Something is not working properly
Projects
None yet
Development

Successfully merging a pull request may close this issue.