Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDB_BAD_RSLOT: Invalid reuse of reader locktable slot #21

Closed
iainlane opened this issue Sep 20, 2016 · 10 comments
Closed

MDB_BAD_RSLOT: Invalid reuse of reader locktable slot #21

iainlane opened this issue Sep 20, 2016 · 10 comments

Comments

@iainlane
Copy link
Collaborator

I got this

object.Exception@source/contentsstore.d(64): mdb_txn_begin[-30783]: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot
----------------
0x4c7f48 void contentsstore.ContentsStore.checkError(int, immutable(char)[])
        source/contentsstore.d:64
0x4c8052 bindings.lmdb.MDB_txn_s* contentsstore.ContentsStore.newTransaction(uint)
        source/contentsstore.d:141
0x4c8052 bool contentsstore.ContentsStore.packageExists(immutable(char)[])
        source/contentsstore.d:186
0x4d0cee __foreachbody5
        source/engine.d:184
0x4f67bb doIt
        /usr/lib/gcc/x86_64-linux-gnu/5/include/d/std/parallelism.d:3859
0x5d11cb void std.parallelism.AbstractTask.job()
        ../../../../src/libphobos/src/std/parallelism.d:415
0x5d11cb void std.parallelism.TaskPool.doJob(std.parallelism.AbstractTask*)
        ../../../../src/libphobos/src/std/parallelism.d:1083
0x5d11cb void std.parallelism.TaskPool.executeWorkLoop()
        ../../../../src/libphobos/src/std/parallelism.d:1138
0x5d11cb void std.parallelism.TaskPool.startWorkLoop()
        ../../../../src/libphobos/src/std/parallelism.d:1117
0x6197f6 void core.thread.Thread.run()
        ../../../../src/libphobos/libdruntime/core/thread.d:1364
0x6197f6 thread_entryPoint
        ../../../../src/libphobos/libdruntime/core/thread.d:371
0x7ffff5c4a6f9 start_thread
        ???:0
0x7ffff5562b5c clone
        ???:0
0xffffffffffffffff ???
        ???:0

The documentation says "A transaction and its cursors must only be used by a single thread, and a thread may only have a single transaction at a time." - is it possible for this to be violated here?

@iainlane
Copy link
Collaborator Author

I suppose if d threads don't correspond 1:1 to OS threads.

@iainlane
Copy link
Collaborator Author

Can't find evidence supporting that theory though...

@ximion
Copy link
Owner

ximion commented Sep 20, 2016

Weird, I never saw this error ever - on which filesystem are you running LMDB and on which architecture?

I took care to now violate the threading restriction, so if I didn't make a mistake, there should not be an error like that.
D threads map 1:1 to OS threads, unless you use fibers from std.concurrency, which behave like green threads / goroutines.

@ximion
Copy link
Owner

ximion commented Sep 20, 2016

Does it succeed if you try again or is this error persistent?

@iainlane
Copy link
Collaborator Author

/dev/vda1 on / type ext4 (rw,relatime,data=ordered) [cloudimg-rootfs] on amd64 (xenial).

I tried 5 (or so) times and got it each time.

I've added MDB_NOTLS to lock on the transactions instead of the threads, no crash so far. There's some warning to be careful with writes, but I don't know how to make that more thread safe since I don't understand what the problem was in the first place.

@ximion
Copy link
Owner

ximion commented Sep 20, 2016

So, why did I never see this? I am running asgen with LMDB 0.9.14-1 in production and 0.9.18-4 for testing.
Xenial has 0.9.17-3.
But the only change in .18 was Fix robust mutex detection on glibc 2.10-11 (ITS#8330), which is probably (?) not relevant here.

@iainlane
Copy link
Collaborator Author

I think that was a build fix...

Maybe the compiler optimisations or the precise versions of GDC are provoking the bug here, but I have no idea.

So I also tried making there be just one ContentsStore (not sure why it's not global anyway), and tried making the global one be __gshared in case some weird copying was going on, but the crash still happened in both cases.

I've read through contentsstore.d and don't see a problem. All transactions and cursors seem to be freed properly.

@ximion
Copy link
Owner

ximion commented Sep 20, 2016

I think ContentsStore isn't global because the machine I run asgen on doesn't have a large amount of memory and I made it only created on demand in an attempt to save some RAM. That was a while back though and I think the contents store actually isn't the thing using the most RAM anyway, so in theory we could make it shared again (make it a member of Engine). The thing using the most ram is the caching and the hash tables.

@iainlane
Copy link
Collaborator Author

On this bug - what do you think about using MDB_NOTLS? I found out that python-lmdb turns it on all the time, so we were using it before...

@ximion
Copy link
Owner

ximion commented Sep 20, 2016

From reading other people's code, it seems like everyone is switching on MDB_NOTLS on. I was a bit worried about how this affects write actions, since we do not do any synchronization there and rely on LMDB to do that properly.

But the docs say:

A thread can only use one transaction at a time, plus any child transactions. Each transaction belongs to one thread. See below. The MDB_NOTLS flag changes this for read-only transactions.

So, it looks like this only impacts read transactions anyway, and those are safe.
So, I think we can enable this by default :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants