Unexpected DistributedLockImpl behavior? #5

Open
ntolia opened this Issue Aug 5, 2011 · 8 comments

Comments

Projects
None yet
4 participants

ntolia commented Aug 5, 2011

I was trying to use com.twitter.common.zookeeper.DistributedLockImpl and discovered that, after a successful lock acquisition, a disconnect from the server or a session timeout would never release the lock (the internal holdsLock will always stay true). Those events are only detected at lock acquisition time. Is this the expected behavior?

ntolia commented Aug 5, 2011

And, while we are on this issue, why does the lock convert a ZooKeeper exception into a runtime/unchecked exception in cleanup() - https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/DistributedLockImpl.java#L206 ? Makes it complicated to properly unlock if a disconnection happens.

ntolia commented Aug 6, 2011

Another unexpected thing. tryLock(long, TimeUnit) does not actually timeout as expected if no ZK server is found and instead seems to block forever. I believe this is because of prepare() which calls ZooKeeperUtils.ensurePath(). ensurePath() can block forever because it calls ZooKeeperClient.get().

Contributor

jsirois commented Aug 16, 2011

Sorry - just noticed this issue. There seem to be a few bugs here:
1.) tryLock(long, TimeUnit) clocking forever is clearly one bug.
2.) cleanup() / the ublic methods should throw documented unchecked exceptions at the very least

The 1st issue you mention can't be fully right. A session timeout implies the lock ephemeral node dies which release the lock. A disconnect is expected to happen and should not affect holdig of the lock, only session expiry or explict release should relinquish the lock.

ntolia commented Aug 16, 2011

About the first issue, lets say that a client creates a lock on a particular path. It manages to grab the lock but is disconnected at some future point in time. The ephemeral node gets destroyed and some other client can therefore grab the lock. However, the first client still assumes it has it and the lock will not get any notification.

With regards to the other bugs, I also found a few more issues. Would you be willing to accept a patch like https://github.com/ntolia/commons/commit/89280c50413f3b92ea142cf17e1c917dcbb119a3 ? I have a similar patch working in a private repository but cannot seem to get pants to work for me. After fixing some python incompatibility issues, it can't seem to find the ' spy#memcached;2.4.2' dependency.

ntolia commented Sep 22, 2011

Bump.

Contributor

wfarner commented Sep 27, 2011

Just wanted to ACK that this is on the radar. Expect some useful feedback by end of business week at the worst.

Contributor

florianleibert commented Sep 28, 2011

Hi -
just got around looking at this. Thanks for your input.

For the issue that you're reporting - would you mind updating the unit test to reflect the issue you are seeing? That way, I'll be able to verify that your patch works.

Thanks!
Florian

ntolia commented Oct 21, 2011

Florian, can you give me an idea of the unit tests you are looking for? Most of my patch has to do with correctness and cleanups. The only thing I can think of right now is a test that checks that tryLock(long timeout, TimeUnit unit) does not block forever if no server is present.

jcrobak referenced this issue in jcrobak/twitter-commons Nov 9, 2012

Merge pull request #5 from jcrobak/fix-no-color
fixes no-color option on cmdline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment