LockWaitTimeout causes lock to be forever unusable #49

Closed
jlaban opened this Issue Sep 21, 2012 · 4 comments

Comments

Projects
None yet
2 participants

jlaban commented Sep 21, 2012

Hi,

This is in reference to issue #40, where you added the very useful functionality of being able to specify a timeout when waiting for locks.

I've just started testing it out, and there seems to be one problem with it: if you don't succeed in getting a lock (i.e. you receive a LockWaitTimeoutError), the lock can forevermore no longer be acquired by any other process or by this process, until this process is killed. So deadlock.

What seems to happen is that the little ephemeral lock attempt node seems to stick around in the lock's parent node, even though the process gave up on the attempt. Would it be possible to clean this node up when we time out? It would be similar to how it gets cleaned up if we don't block waiting, and the lock attempt is unsuccessful.

(The problem is actually quite bad, because if there's contention, everybody is timing out and leaving their little ephemeral lock attempts around, and even if you aggressively kill the processes if they fail to acquire the locks, there's a bit of delay there, so you pretty much get livelock rather than deadlock.)

Thanks!
John

jlaban commented Sep 21, 2012

Just as a side note, I've tried calling unlock regardless if I acquired the lock or not, and it hasn't helped... if there's some other trick I'm overlooking, please let me know.

Contributor

slyphon commented Sep 21, 2012

gah, that sucks!! what a terrible bug :P

Ok, I'll have a look once i get home, i'm currently on the train. Hopefully i can get this patched tonight

jlaban commented Sep 21, 2012

That would be awesome; thanks Jonathan. Hopefully it's an easy fix.

slyphon added a commit that referenced this issue Sep 22, 2012

Cleanup lock path after timeout error on locker
Fix issue #49, where a timed-out lock would never be cleaned up

@slyphon slyphon closed this in 518bd87 Sep 22, 2012

jlaban commented Sep 24, 2012

Thanks for the quick fix. The new code seems to work great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment