Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNI Crash when calling MEDIAN function #418

Closed
tildedave opened this issue Apr 9, 2019 · 2 comments

Comments

@tildedave
Copy link
Contributor

commented Apr 9, 2019

Hi, I've run into a strange issue in JNI using the sqlite-jdbc driver and I was hoping someone here might have some ideas on how to get further. When calling the MEDIAN function on a sufficiently large array of sufficiently large doubles in descending order, a SIGSEGV causes my application to crash. The values being sorted from largest to smallest seems to matter, in that I can't reproduce this without it.

I've reproduced this crash in a project here: https://github.com/tildedave/sqlite-jdbc-crash and it crashes on both my Mac Powerbook and an x86_64 Virtualbox.

I wasn't able to reproduce this with a C program so my assumption is that this is some issue in how the JNI library is being built and packaged. Mostly I'm looking to understand this issue better so I can figure out what the right way to prevent it from happening is (we use these extension functions quite heavily).

Program Reproducing Error

List<Double> values = new ArrayList<>();

Random r = new Random();
for (int i = 0; i < 50_000; i++) {
    values.add((double) (r.nextInt(1_400_000)));
}
// Does not fail without reverse sorting
values.sort((o1, o2) -> -o1.compareTo(o2));

SQLiteConnection connection = (SQLiteConnection) DriverManager.getConnection("jdbc:sqlite::memory:");
connection.setAutoCommit(true);

try (Statement stmt = connection.createStatement()) {
    stmt.execute("CREATE TABLE table_0(\"num\" real NOT NULL)");
}

try (PreparedStatement stmt = connection.prepareStatement("INSERT INTO table_0 VALUES (?)")) {
    for(Double value: values) {
        stmt.setDouble(1, value);
        stmt.execute();
    }
}

try (Statement stmt = connection.createStatement()) {
    stmt.execute("SELECT MEDIAN(\"num\") FROM table_0");
}

Info From strace/GDB

On x86_64 Linux when run under strace the final failure is:

[pid 23956] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fda523caff8} ---
[pid 23956] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---

The dumped core ends up looking like:

(gdb) bt 10
#0  0x00007f79925db7a0 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#1  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#2  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#3  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#4  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#5  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#6  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#7  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#8  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
#9  0x00007f79925db7a5 in ?? () from /tmp/sqlite-3.27.2-585d4328-ac13-4b5e-a8ef-288d40614f3d-libsqlitejdbc.so
(many similar frames follow)
#32255 0x00007f7993a6d4fc in ?? ()
#32256 0x00007f79aa44d6f0 in ?? ()
#32257 0x00007f7993a96ff0 in ?? ()
#32258 0x00007f7993b63db0 in ?? ()
#32259 0x00007f7993a6d5e8 in ?? ()
#32260 0x00007f79aa44d650 in ?? ()
#32261 0x00007f79aa44d6e0 in ?? ()
#32262 0x00007f79aa44d740 in ?? ()
#32263 0x00007f799437fa90 in ?? ()
#32264 0x0000000000000000 in ?? ()

I dug around online and the SEGV_ACCERR error I'm getting looks like an illegal memory access from a stack frame violation.

Other Things I Checked

  • Both 3.27.2.1 and 3.25.2 versions of xerial/sqlite-jdbc fail, no difference.
  • In-memory and on disk versions of SQLite both fail
  • I was unable to reproduce this with a C program that linked sqlite3 3.27.1 and compiled extension-functions.c in (both the version in this repo and the last-updated-2010 version on sqlite.org). The C program takes a while to perform the SELECT MEDIAN query (~10 seconds) but it does complete.

Thanks for any help that you can provide.

@tildedave

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2019

I traced the stack violation down to the code in extension-functions.c, in the xFinal flow (after each xStep has been run):

void node_iterate(node *n, map_iterator iter, void* p){
  if(n){
    if(n->l) {
      node_iterate(n->l, iter, p);
    }
    iter(n->data, n->count, p);
    if(n->r) {
      node_iterate(n->r, iter, p);
    }
  }
}

I'm not too familiar with how the JNI stack gets set and how it relates to the C stack, but on my machine it crashed after 16k entries, not enough . This also explains why the error happens when everything is sorted from largest to smallest. I checked node_insert (which gets run during xStep) and this was written in a tail-recursive style so the compiler can optimize out the extra stack frames.

@tildedave

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2019

I was able to fix this by increasing the stack space to the JVM with the -Xss argument. I'll bring up the behavior of this function with the SQLite mailing list and see if it might make more sense for this to use a malloced stack for node traversal.

@tildedave tildedave closed this Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.