New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when executing statically compiled linux appliction with musl #105
Comments
…same #105 modified: p_elf_enum.h modified: p_lx_elf.cpp modified: stub/src/amd64-linux.elf-main.c modified: ../.github/travis_testsuite_1.sh modified: stub/amd64-linux.elf-fold.h modified: stub/tmp/amd64-linux.elf-fold.map
Fixed for now by 6e541a4 on 'devel' branch. Revamp of stubs for -fPIE is due. |
The input has extraneous info in the target of RELA relocations. The final result is independent of the original contents of the target (all info is in the 24-byte RELA entry), so the original content might as well be 0. However, many many of the targets for UwTerminalX2 initially contain a copy of .r_addend, which costs at least 2 bytes of compressed output. There are 50546 RELA targets, so the compressed output is around 100K larger than if the targets began as 0. |
I've tested the devel branch and it's now working with 64-bit exectuables - thank you. However, 32-bit executables compiled on the same system (using the 32-bit distro instead) are segfaulting. GDB trace:
|
@jreiser Test program: https://www.dropbox.com/s/lc680o172pmf45x/UwTerminalX_32_musl_upx.tar.gz?dl=0 (not compressed with upx, I've just tried the two latest builds in the devel builds but the output is the same, executable works before compressing, after compressing segfaults) |
OK got the same problem with a statically compiled musl ARM application (tested with UPX 3.94) on a raspberry pi.
Uncompressed test program: https://www.dropbox.com/s/e2nw9tzsqdajgl8/UwTerminalX_rpi.7z?dl=0 Had to compress it on x86_64 PC as the raspberry pi gave this error when trying to run upx:
|
@jreiser Nice, I've tested on 64-bit and ARM builds and it's great, however there's something very strange when using it on 32-bit executables (I'm using a 64-bit UPX build to compress all the builds): if I run the compressed executable on a 64-bit linux system it runs fine. If I run it on a 32-bit linux system it segmentation faults - however if I run it on the same system using gdb it runs fine without a segmentation fault, therefore I'm not actually able to investigate the problem. Performing an strace gives a SEGV_MAPERR but this is as much information as I can get (I've tried it on a debian 8 VM, an old 3.x-kernel arch linux VM and Fedora core 22 and they all act the same) Link to the UPX compressed 32-bit executable: https://www.dropbox.com/s/h5fft1azj0ww4mc/Uw32a.7z?dl=0 |
@thedjnK Please copy+paste the SEGV_MAPERR and preceding 10 or 15 lines from strace. They should give me another clue while I rustle up a real i686 system. (Running Uw32a in 32-bit mode on x86_64 does work for me, too.) |
@thedjnK Running Fedora-Live-Workstation-i686-23-10.iso from 1.5 years ago:
I see no problems in 10 attempts under strace. One of them begins:
which is well past the upx de-compression stub, and into Qt initialization. How big is your shell environment?
(Delete lines with sensitive info, but please tell me about how many bytes you deleted.) |
@thedjnK Similar lack of failure using:
There is one clue in the strace, which begins:
Note the 'old_mmap' vs 'mmap2'. Old_mmap uses register %ebx to point to a 6-word vector of arguments, almost always at top-of-stack. (new)mmap2 puts arguments into registers %ebx, %ecx, %edx, %esi, %edi, %ebp. Both use %eax=0xC0 as the syscall number. |
Fedora set: 1648 5411 59264 Fedora set values: https://pastebin.com/2guZB8pF (nothing removed) Fedora strace:
Debian... Now this is odd, it didn't have strace so I installed strace and the executable now runs fine all the time. Removed strace and it still runs fine so I'm not sure what's happened. Debian set: 1854 5799 62957 Debian set values: https://pastebin.com/HGpZRXYz (nothing removed) Debian strace:
Arch set: 61 77 1559 Arch set values: https://pastebin.com/r37ztpcm (nothing removed) Arch strace:
|
Managed to reproduce the issue with a 64-bit ubuntu VM so it appears it isn't limited to 32-bit hosts. Ubuntu set: 2803 8266 94138 Ubuntu set values: https://pastebin.com/UgrNWgh0 (nothing removed) Ubuntu strace (non-working UPX version):
Ubuntu strace (working non-UPX version):
This is ubuntu 14.04 LTS, uname: |
@thedjnK Commit b2115a4 on 'devel' branch cleans [most of] the stack (on i386 ONLY for now) when the de-compression stub finishes, and may help. The problem is that some user code assumes that "virgin" stack frames have been zeroed. For the case of the UPX de-compression stub, then a large number of shell environment variables (regardless of character length) can make the problem more visible. Another item: Please try running the never-compressed Uw32 under valgrind. Memcheck complains of Uninitialized values. This almost certainly indicates bugs in the user code, which could be one source of "random" SIGSEGV errors. The commit above is an attempt to "cover up" some of the problem. Example complaints from memcheck:
For the Uw32 test case, then address 0xF169C6 is the common tail of all syscalls. I have entered https://bugs.kde.org/show_bug.cgi?id=381304 requesting that memcheck track the syscall number as well as the program counter. There is a possbility that some of the uninit values are created as a result of a misunderstanding about brk(), sbrk(), and malloc(). This is tricky; I'm still looking at it. "created by a stack allocation" gives an address near the start of the subroutine. Use the un-stripped program to lookup which routine.
|
@jreiser I recompiled without stripping: https://www.dropbox.com/s/s6f6agxsjo32zzt/Uw32f.tar.gz?dl=0 I'm not sure if the addresses have changed but if not then the output is:
Seems to be related to xcb. Running the unstripped executable in valgrind returns ~830,000 errors, I'm seeing a lot in malloc (this is tested on a 64-bit system whereby running the executable normally is fine), e.g.
I noticed 0x4CFF10 a few times so I'm assuming this is the location you were seeing:
I'll try rebuilding without using the system try icon functionality and test it alongside the patched release soon. I noticed one in an SSL function regarding sorting engines but I assume the code on the failing systems doesn't get this far before segfaulting and therefore isn't related. |
Just tried the new upx commit and removing the system tray code & SSL code but it's still segfaulting (just trying on 64-bit ubuntu) |
@thedjnK Please file a report against libmusl for poor integration with valgrind. libmusl is wasting the time of developers who use it.
Conditionally clearing each word of a newly malloc()ed block causes the read-before-initialized complaint. Valgrind(memcheck) has low-overhead mechanisms for communicating about custom allocators. Also, modern memset() is no slouch. If a modest fraction of the block ever is written after __malloc0() exits, then it is arguable whether the conditional clear saves any time overall. Loading the data cache tends to dominate other delays. I'm still looking at UwTerminal ... |
@thedjnK I have not been able to reproduce the problem running any of the 32-bit programs (compressed or not) on
nor on the i586 kernel which corresponds to the i686-pae kernel. I installed and ran on real hardware: a Core2Duo with sse, sse2, ssse3, and sse4_1 CPU feature flags, but not sse4_2 [grep flags /proc/cpuinfo]. valgrind(memcheck) is not as useful as I expected. libmusl not integrating with memcheck, and memcheck not understanding all the intricacies of socket() calls, dramatically lower the usefulness of memcheck on this problem. I'm disappointed. So, I think it's time for a "reset" of sorts. You can reproduce the problem, so let's get usable info from that: namely, a core dump. Please search the web for "Debian enable core dump", implement those steps, generate a core dump, and send me a pointer to it (along with which executable you were running.) Meanwhile, I will backup and try to reproduce your environment more faithfully. Please tell me the details of the virtualization that you use: kvm or xen, which version of the virtualization, which version of the Linux kernel running under the virtualization, which CPU flags are enabled (particularly sse, sse2, ssse3, sse4_1, sse4_2, avx). |
I wasn't able to reproduce it on debian. This, however, is very odd. I downloaded a ubuntu 16.04 LTS ISO and tested on that - no problems. I also tried with an older 14.04 LTS ISO and that too ran it without issue (I'm now using a different host system for testing with a skylake CPU). I then copied the problematic 14.04 LTS image from the other host to this host and tried it in that and it has the segmentation fault. The original host is a 64-bit windows 7 laptop and this host is 64-bit arch linux, using virtualbox 5.x. The original host has a Haswell i5-4300M CPU http://ark.intel.com/products/76347/Intel-Core-i5-4300M-Processor-3M-Cache-up-to-3_30-GHz and this PC has an i5-6600K CPU https://ark.intel.com/products/88191/Intel-Core-i5-6600K-Processor-6M-Cache-up-to-3_90-GHz
The flags from the problematic ubuntu VM:
Core dump from the ubuntu system (using the Uw32a file linked earlier, the system information for this VM is also in a previous post): https://www.dropbox.com/s/6njfoxbqke0revs/core.tar.gz?dl=0 I'll see if I can reduce the size of this VM image (currently at 8GB) and upload it |
@jreiser OK I've compressed the image down and uploaded it to https://mega.nz/#!F3gk1CwT!Noc9y7y_tMYBp5UTkJgZ7-Cz3LDLTYR8H4lMSD4fQ-Q If you create a new 64-bit ubuntu VM in virtualbox 5.x and select 'use an existing hard drive' and select the extracted file, then in settings set the RAM to 1GB and CPU count to 1 or 2 then run it and login (password is password). Click the Okay button, right click and go to Application > Terminal Emulators > Gnome Terminal, then you can run the executable from here using ./Uw32a and see the segmentation fault. |
@thedjnK Thank you. I have downloaded the core.tar.gz and the ubuntu VM image, and am investigating. |
#105 modified: stub/src/i386-linux.elf-entry.S modified: stub/i386-linux.elf-entry.h modified: stub/tmp/i386-linux.elf-entry.bin.dump
@thedjnK Fixed on 'devel' branch by 9f20bbb . The VDSO (automatically added to the process image by the Linux kernel) might overlap our desired placement on -pie ET_DYN; so move below it. Thank you for persevering. The VirtualBox really helped. Technique to avoid interference by gdb: Insert a deliberate infinite loop shortly before the code of interest in the UPX stub. Re-compress Uw32a, run in background (./Uw32a &). Attach gdb during infinite loop ("sudo gdb ./Uw32a PID" where the 'sudo' is necessary on Ubuntu in order to PTRACE a non-child process). Adjust: set $pc+=2 etc. |
@jreiser Thank you for taking the time to look at this. I can confirm the fix works on the 64-bit ubuntu VM, however I've tried it on a 32-bit ubuntu 15 VM and the fedora core 22 32-bit VM and unfortunately it's segfaulting on both of them with the updated executable. I'll have a look into debugging on these distros when I get some time. |
Ping - just ran into this issue with a 64 bit binary. |
@voltagex Which architecture and Linux distribution and version ("uname -r")? If you can reproduce the problem with a small binary such as
then please gzip and upload it ("Attach files"). Otherwise please copy+paste the output of "readelf --segments a.out". |
Apologies for the delay.
Host system is Debian sid on kernel 4.12.0-2-amd64, I'm running into the issue in a Docker container that's using Alpine and musl. Dockerfile used to repro the issue
Note that I'm running upx and hitting the issue on the host Debian system. |
@voltagex Exit code 139 = (128 + 11) and SIGSEGV is 11, so that matches. I will look for a kernel 4.12.0-2 system. Meanwhile, please run under gdb:
This information will help diagnose the problem. Thank you. |
|
@thedjnK @voltagex Please try the devel branch. I have changed the strategy used by the runtime decompression stub, to remove assumptions about which address space is available. This should remove conflicts caused by placement of [vdso], [vvar], etc. Particularly because of the system-dependent nature of the problems shown here [the placement strategy varies from kernel to kernel], I believe that such conflicts were the cause. |
@jreiser Just tested it with the previously failing 32-bit executable and it now works fine, I haven't tested the new build on 64-bit or ARM executables, thank you for the fix! |
This was causing segfaults on alpine: upx/upx#105 Should be fixed in the next upx release
What's the problem (or question)?
Attempting to use upx to compress a statically linked linux executable (against musl instead of glibc using alpine linux). The compression itself is fine but when running the application (arch linux, ubuntu) it segmentation faults instantly. Running the non-upx compressed executable is fine on all systems. I'm only able to get a trace of it on my desktop system as gdb on the VMs gives a 'Not in executable format: file format not recognised' error. All systems are 64-bit, the executable is 64-bit and the gdb version is 64-bit. I'm assuming this might be related to #93 I've also tried the devel branch and using upx on the VM where it segfaults but it still has the problem.
GDB log:
What should have happened?
Expected the program to run.
Do you have an idea for a solution?
Not a clue
How can we reproduce the issue?
Please tell us details about your environment.
The text was updated successfully, but these errors were encountered: