Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instructions emitted when compiling certain C libraries using 'zig cc'. #7636

Closed
lithdew opened this issue Jan 1, 2021 · 2 comments
Milestone

Comments

@lithdew
Copy link
Contributor

lithdew commented Jan 1, 2021

This is an issue that I've been trying to debug for the last few days, though the behavior appears to be consistent when compiling and running test programs using certain C libraries such as LMDB, libmdbx, or sqlite3.

To make reproduction simpler, I'll focus on LMDB as it has the least amount of code (and only one include path + two C source files to compile against).

These are the versions of clang and gcc I tested with:

$ clang --version
clang version 7.1.0 (tags/RELEASE_710/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/3m76ry913ky4zb2frdbic3wa7gr69084-clang-7.1.0/bin

$ gcc --version
gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The test program (test.c) is as follows:

#include <stdio.h>
#include <assert.h>
#include "lmdb.h"

int main(int argc, char * argv[]) {
  MDB_env * env;
  MDB_dbi dbi;
  MDB_val key, data;
  MDB_txn * txn;
  MDB_cursor * cursor;
  char sval[32];

  assert(mdb_env_create( & env) == MDB_SUCCESS);
  assert(mdb_env_set_maxdbs(env, 2) == MDB_SUCCESS);
  assert(mdb_env_open(env, "./testdb", MDB_NOSUBDIR | MDB_WRITEMAP, 0664) == MDB_SUCCESS);
  assert(mdb_txn_begin(env, NULL, 0, & txn) == MDB_SUCCESS);
  assert(mdb_dbi_open(txn, "test", MDB_CREATE | MDB_DUPSORT | MDB_DUPFIXED, & dbi) == MDB_SUCCESS);

  for (uint64_t i = 0; i < 4096; i++) {
    key.mv_data = "index";
    key.mv_size = sizeof(key.mv_data) - 1;

    data.mv_data = (void * )( & i);
    data.mv_size = 8;

    assert(mdb_put(txn, dbi, & key, & data, 0) == MDB_SUCCESS);
  }

  mdb_close(env, dbi);
  mdb_env_close(env);
  return 0;
}

These are the commands I used for compiling the test program:

$ clang test.c libraries/liblmdb/mdb.c libraries/liblmdb/midl.c -pthread -I libraries/liblmdb -o test
$ gcc test.c libraries/liblmdb/mdb.c libraries/liblmdb/midl.c -pthread -I libraries/liblmdb -o test

Running the program compiled with either clang or gcc, the program exits and completes successfully.

Now, if I were to use zig cc:

$ zig cc test.c libraries/liblmdb/mdb.c libraries/liblmdb/midl.c -pthread -I libraries/liblmdb -o test

$ ./test
Illegal instruction (core dumped)

Weird. Let's open it up on gdb:

Program received signal SIGILL, Illegal instruction.
0x00000000002136d3 in mdb_xcursor_init1 ()
(gdb) bt
#0  0x00000000002136d3 in mdb_xcursor_init1 ()
#1  0x0000000000209578 in mdb_cursor_put ()
#2  0x0000000000215e56 in mdb_put ()
#3  0x0000000000205443 in main ()

No useful backtrace. I did some printf debugging and there were no assertions hit nor dangling pointers / null pointers lurking amongst the code in mdb_xcursor_init1. So, let's view what's going on at the assembly-level.

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   0x2136c5 <mdb_xcursor_init1+725>        lea    -0xdfbc(%rip),%rax        # 0x205710 <mdb_cmp_long>                               │
│   0x2136cc <mdb_xcursor_init1+732>        mov    %rax,(%rcx)                                                                       │
│   0x2136cf <mdb_xcursor_init1+735>        vzeroupper                                                                               │
│   0x2136d2 <mdb_xcursor_init1+738>        ret                                                                                      │
│  >0x2136d3 <mdb_xcursor_init1+739>        ud2                                                                                      │
│   0x2136d5                                    data16 nopw %cs:0x0(%rax,%rax,1)                                                     │
│   0x2136e0 <mdb_xcursor_init2>            test   $0x7,%dil                                                                         │
│   0x2136e4 <mdb_xcursor_init2+4>          jne    0x213830 <mdb_xcursor_init2+336>                                                  │
│   0x2136ea <mdb_xcursor_init2+10>         test   %rdi,%rdi                                                                         │
│   0x2136ed <mdb_xcursor_init2+13>         je     0x213830 <mdb_xcursor_init2+336>                                                  │
│   0x2136f3 <mdb_xcursor_init2+19>         add    $0x10,%rdi                                                                        │
│   0x2136f7 <mdb_xcursor_init2+23>         test   $0x7,%dil                                                                         │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7ffff7c8c7 In: mdb_xcursor_init1                                                               L??   PC: 0x2136d3 
(gdb) layout asm

... and for some reason, there is a ud2 (undefined) instruction right after ret.

Let's see how the assembly is like around the same code region for the binary emitted by gcc:

(gdb) b mdb_xcursor_init1
Breakpoint 1 at 0x40de0b
(gdb) r
Starting program: /home/lith/Desktop/lmdb-zig/test 
warning: File "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/nix/store/isy60my0ijjzh49rscgdb1i2457nf7lp-gcc-9.3.0-lib".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

Breakpoint 1, 0x000000000040de0b in mdb_xcursor_init1 ()
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   0x40e012 <mdb_xcursor_init1+523>        jne    0x40e023 <mdb_xcursor_init1+540>                                                  │
│   0x40e014 <mdb_xcursor_init1+525>        mov    -0x8(%rbp),%rax                                                                   │
│   0x40e018 <mdb_xcursor_init1+529>        movq   $0x407511,0x1c8(%rax)                                                             │
│   0x40e023 <mdb_xcursor_init1+540>        nop                                                                                      │
│   0x40e024 <mdb_xcursor_init1+541>        leave                                                                                    │
│   0x40e025 <mdb_xcursor_init1+542>        ret                                                                                      │
│   0x40e026 <mdb_xcursor_init2>            push   %rbp                                                                              │
│   0x40e027 <mdb_xcursor_init2+1>          mov    %rsp,%rbp                                                                         │
│   0x40e02a <mdb_xcursor_init2+4>          push   %rbx                                                                              │
│   0x40e02b <mdb_xcursor_init2+5>          mov    %rdi,-0x20(%rbp)                                                                  │
│   0x40e02f <mdb_xcursor_init2+9>          mov    %rsi,-0x28(%rbp)                                                                  │
│   0x40e033 <mdb_xcursor_init2+13>         mov    %edx,-0x2c(%rbp)                                                                  │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
native process 28572 In: mdb_xcursor_init1                                                                         L??   PC: 0x40de0b 
(gdb) layout asm

... and there is no ud2 instruction in the binary compiled with gcc after ret! Same goes for clang as well.

So in conclusion, this ud2 instruction for some reason keeps being emitted with C code compiled with Zig right after ret instructions of static methods. I reached the same issue with test code I made for libmdbx as well.

The same issue came up about illegal instructions for sqlite3 as well, which was brought to my attention by @nektro.

Might this be due to additional assertion checks emitted by the Zig compiler when analyzing static methods by chance? Or might this just be as a result of a C compiler flag that should've been set/cleared?

Would appreciate any assistance on this 🙏.

@daurnimator
Copy link
Collaborator

... and for some reason, there is a ud2 (undefined) instruction right after ret.

https://github.com/ziglang/zig/wiki/FAQ#why-do-i-get-illegal-instruction-when-using-with-zig-cc-to-build-c-code

@lithdew
Copy link
Contributor Author

lithdew commented Jan 1, 2021

... and for some reason, there is a ud2 (undefined) instruction right after ret.

https://github.com/ziglang/zig/wiki/FAQ#why-do-i-get-illegal-instruction-when-using-with-zig-cc-to-build-c-code

😞 Yep, that's the reason why. Turns out LLVM's undefined behavior sanitizer mangled some of the methods in those C libraries. Is there any way to get a verbose warning during compile-time if UBsan gets triggered?

EDIT: It looks like there is an issue filed already for it at #5163. I'll close this issue.

@lithdew lithdew closed this as completed Jan 1, 2021
@andrewrk andrewrk added this to the 0.8.0 milestone Jan 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants