Skip to content

Bug: dead lock once connection session timeout? #20

Open
billmuch opened this Issue Mar 9, 2012 · 1 comment

1 participant

@billmuch
billmuch commented Mar 9, 2012

hi,

I found the thread will never be notified once it can not owned the lock and wait on the syncPoint.await() (line 125) and then the connection session timeout and reconnect again. The the lock will be dead since the lock file on zookeeper server already been deleted and it will never be notified again.

How to reproduce:
1. write client code to make it wait on a lock (some other thread write a lock file on server first)
2. disconnect the network by dropping the line
3. reconnect it after a long enough time (after the server thinks client session timeout and delete the lock file on server)
Then you will find the lock file on zookeeper server already gone and the reconnected client will wait on the SyncPoint forever.

Reason:
the session timeout event zookeeper client library send to DistributedLockImpl was droped wrongly by the line 266:
if (!event.getPath().equals(watchedNode)) {
267 LOG.log(Level.INFO, "Ignoring call for node:" + watchedNode);
268 return;
269 }

How to fix:
you may want to move the line 266 test to the else if (event.type = Event.EventType.NodeDeleted) block.

Could you take a look and fix this bug?

By the way, I am curious are this code is still using by twitter?

Thanks
billmuch

@billmuch

another dead lock for the same edge case found:
when client detected session timeout and call the cancelAttempt() to handle this event, deadlock will happen because event handle thread are trying to get the object lock (synchronized cancelAttepmt()) to notify the syncPoint lock; but another thread already hold the object lock(synchronized lock()) but wait for the syncPoint to release the object lock.

Please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.