This file notes feature differences and bugfixes contained between releases.
- Narsty bug in Locker (#54)
If a locker is waiting on the lock, and a connection interruption occurs (that doesn't render the session invalid), the waiter will attempt to clean up while the connection is invalid, and not succeed in cleaning up its ephemeral. This patch will recognize that the
@lock_path was already acquired, and just wait on the current owner (ie. it won't create an erroneous third lock node). The reproduction code has been added under
- bug fix for "Callbacks Hash in EventHandlerSubscription::Base gets longer randomly" (#52)
I'd like to point out that the callbacks hash gets longer deterministically, depending on what callbacks get registered. This patch will do further cleanup so as not to leave empty arrays littering the EventHandler.
- bug fix for "Ephemeral node for exclusive lock not cleaned up when failure happens during lock acquisition" (#51)
- Fixes nasty bug "LockWaitTimeout causes lock to be forever unusable" (#49)
The code path in the case of a LockWaitTimeout would skip the lock node cleanup, so a given lock name would become unusable until the timed-out-locker's session went away. This fixes that case and adds specs.
- Added Locker timeout feature for blocking calls. (issue #40)
Previously, when dealing with locks, there were only two options: blocking or non-blocking. In order to come up with a time-limited lock, you had to poll every so often until you acquired the lock. This is, needless to say, both inefficient and doesn't allow for fair acquisition.
A timeout option has been added so that when blocking waiting for a lock, you can specify a deadline by which the lock should have been acquired.
zk = ZK.new locker = zk.locker('lock name') begin locker.lock(:wait => 5.0) # wait up to 5.0 seconds to acquire the lock rescue ZK::Exceptions::LockWaitTimeoutError $stderr.puts "could not acquire the lock in time" end
Also available when using the convenience
zk = ZK.new begin zk.with_lock('lock name', :wait => 5.0) do |lock| # do stuff while holding lock end rescue ZK::Exceptions::LockWaitTimeoutError $stderr.puts "could not acquire the lock in time" end
- Remove unnecessary dependency on backports gem
- Fix for use in resque! A small bug was preventing resque from activating the fork hook.
- Retry when lock creation fails due to a NoNode exception
- Change state call to reduce the chances of deadlocks
One of the problems I've been seeing is that during some kind of shutdown event, some method will call
connected? which will acquire a mutex and make a call on the underlying connection at the exact moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached
@last_cnx_state as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
- Small fixes for zk-eventmachine compatibilty
- Locker cleanup code!
When a session is lost, it's likely that the locker's node name was left behind. so for
zk.locker('foo') if the session is interrupted, it's very likely that the
/_zklocking/foo znode has been left behind. A method has been added to allow you to safely clean up these stale znodes:
ZK.open('localhost:2181') do |zk| ZK::Locker.cleanup(zk) end
Will go through your locker nodes one by one and try to lock and unlock them. If it succeeds, the lock is naturally cleaned up (as part of the normal teardown code), if it doesn't acquire the lock, then no harm, it knows that lock is still in use.
create('/path', 'data', :or => :set)which will create a node (and all parent paths) with the given data or set its contents if it already exists. It's intended as a convenience when you just want a node to exist with a particular value.
Fixed reconnect code. There was an occasional race/deadlock condition caused because the reopen call was done on the underlying connection's dispatch thread. Closing the dispatch thread is part of reopen, so this would cause a deadlock in real-world use. Moved the reconnect logic to a separate, single-purpose thread on ZK::Client::Threaded that watches for connection state changes.
'private' is not 'protected'. I've been writing ruby for several years now, and apparently I'd forgotten that 'protected' does not work like how it does in java. The visibility of these methods has been corrected, and all specs pass, so I don't expect issues...but please report if this change causes any bugs in user code.
Fix locker cleanup code to avoid a nasty race when a session is lost, see issue #34
Fix potential deadlock in ForkHook code so the mutex is unlocked in the case of an exception
Do not hang forever when shutting down and the shutdown thread does not exit (wait 30 seconds).
:retry_durationoption to client constructor which will allows the user to specify for how long in the case of a connection loss, should an operation wait for the connection to be re-established before retrying the operation. This can be set at a global level and overridden on a per-call basis. The default is to not retry (which may change at a later date). Generally speaking, a timeout of > 30s is probably excessive, and care should be taken because during a connection loss, the server-side state may change without you being aware of it (i.e. events will not be delivered).
Small fork-hook implementation fix. Previously we were using WeakRefs so that hooks would not prevent an object from being garbage collected. This has been replaced with a finalizer which is more deterministic.
Ok, now seriously this time. I think all of the forking issues are done.
Implemented a 'stop the world' feature to ensure safety when forking. All threads are stopped, but state is preserved.
fork()can then be called safely, and after fork returns, all threads will be restarted in the parent, and the connection will be torn down and reopened in the child.
The easiest, and supported, way of doing this is now to call
ZK.install_fork_hookafter requiring zk. This will install an
alias_method_chainstyle hook around the
Kernel.forkmethod, which handles pausing all clients in the parent, calling fork, then resuming in the parent and reconnecting in the child. If you're using ZK in resque, I highly recommend using this approach, as it will give the most consistent results.
Logging is now off by default, and uses the excellent, can't-recommend-it-enough, logging gem. If you want to tap into the ZK logs, you can assign a stdlib compliant logger to
ZK.loggerand that will be used. Otherwise, you can use the Logging framework's controls. All ZK logs are consolidated under the 'ZK' logger instance.
- True fork safety! The
zookeeperat 1.1.0 is finally fork-safe. You can now use ZK in whatever forking library you want. Just remember to call
#reopenon your client instance in the child process before attempting any opersations.
- Added a new
:ignoreoption for convenience when you don't care if an operation fails. In the case of a failure, the method will return nil instead of raising an exception. This option works for
statwill ignore the option (because it doesn't care about the state of a node).
# so instead of having to do: begin zk.delete('/some/path') rescue ZK::Exceptions;:NoNode end # you can do zk.delete('/some/path', :ignore => :no_node)
- MASSIVE fork/parent/child test around event delivery and much greater stability expected for linux (with the zookeeper-1.0.3 gem). Again, please see the documentation on the wiki about proper fork procedure.
- fix a bug where a forked client would not have its 'outstanding watches' cleared, so some events would never be delivered
Phusion Passenger and Unicorn users are encouraged to upgrade!
fork(): ZK should now work reliably after a fork() if you call
reopen()ASAP in the child process (before continuing any ZK work). Additionally, your event-handler (blocks set up with
zk.register) will still work in the child. You will have to make calls like
zk.stat(path, :watch => true)to tell ZooKeeper to notify you of events (as the child will have a new session), but everything should work.
See the fork-handling documentation on the wiki.
You are STRONGLY ENCOURAGED to go and look at the CHANGELOG from the zookeeper 1.0.0 release
NOTICE: This release uses the 1.0 release of the zookeeper gem, which has had a MAJOR REFACTORING of its namespaces. Included in that zookeeper release is a compatibility layer that should ease the transition, but any references to Zookeeper* heirarchy should be changed.
Refactoring related to the zokeeper gem, use all the new names internally now.
Create a new Subscription class that will be used as the basis for all subscription-type things.
Add new Locker features!
LockerBase#assert!- will raise an exception if the lock is not held. This check is not only for local in-memory "are we locked?" state, but will check the connection state and re-run the algorithmic tests that determine if a given Locker implementation actually has the lock.
LockerBase#acquirable?- an advisory method that checks if any condition would prevent the receiver from acquiring the lock.
Deprecation of the
unlock!methods. These may change to be exception-raising in a future relase, so document and refactor that
unlockare the way to go.
Fixed a race condition in
event_catcher_spec.rbthat would cause 100% cpu usage and hang.
Documentation for Locker and ilk
Fixes for Locker tests so that we can run specs against all supported ruby implementations on travis (relies on in-process zookeeper server in the zk-server-1.0.1 gem)
Support for 1.8.7 will be continued
(forgot to put this here, put it in the readme though)
NEW! Thread-per-Callback event delivery model! Read all about it!. Provides a simple, sane way to increase the concurrency in your ZK-based app while maintaining the ordering guarantees ZooKeeper makes. Each callback can perform whatever work it needs to without blocking other callbacks from receiving events. Inspired by Celluloid's actor model.
Use the zk-server gem to run a standalone ZooKeeper server for tests (
rake SPAWN_ZOOKEEPER=1). Makes live-fire testing of any project that uses ZK easy to run anywhere!
Threaded client (the default one) will now automatically reconnect (i.e.
reopen()) if a
AUTH_FAILEDevent is received. Thanks to @eric for pointing out the nose-on-your-face obviousness and importance of this. If users want to handle these events themselves, and not automatically reopen, you can pass
:reconnect => falseto the constructor.
allow for both :sequence and :sequential arguments to create, because I always forget which one is the "right one"
add zk.register(:all) to recevie node updates for all nodes (i.e. not filtered on path)
add 'interest' feature to zk.register, now you can indicate what kind of events should be delivered to the given block (previously you had to do that filtering inside the block). The default behavior is still the same, if no 'interest' is given, then all event types for the given path will be delivered to that block.
zk.register('/path', :created) do |event| # event.node_created? will always be true end
or multiple kinds of events
zk.register('/path', [:created, :changed]) do |event| # (event.node_created? or event.node_changed?) will always be true end
create now allows you to pass a path and options, instead of requiring the blank string
zk.create('/path', '', :sequential => true)
zk.create('/path', :sequential => true)
fix for shutdown: close! called from threadpool will do the right thing
Chroot users rejoice! By default, ZK.new will create a chrooted path for you.
ZK.new('localhost:2181/path', :chroot => :create) # the default, create the path before returning connection
ZK.new('localhost:2181/path', :chroot => :check) # make sure the chroot exists, raise if not
ZK.new('localhost:2181/path', :chroot => :do_nothing) # old default behavior
and, just for kicks
ZK.new('localhost:2181', :chroot => '/path') # equivalent to 'localhost:2181/path', :chroot => :create
Most of the event functionality used is now in a ZK::Event module. This is still mixed into the underlying slyphon-zookeeper class, but now all of the important and relevant methods are documented, and Event appears as a first-class citizen.
Support for 1.8.7 WILL BE DROPPED in v1.1. You've been warned.
The "Don't forget to update the RELEASES file before pushing a new release" release
Fix a fairly bad bug in event de-duplication (diff: http://is.gd/a1iKNc)
This is fairly edge-case-y but could bite someone. If you'd set a watch when doing a get that failed because the node didn't exist, any subsequent attempts to set a watch would fail silently, because the client thought that the watch had already been set.
We now wrap the operation in the setup_watcher! method, which rolls back the record-keeping of what watches have already been set for what nodes if an exception is raised.
This change has the side-effect that certain operations (get,stat,exists?,children) will block event delivery until completion, because they need to have a consistent idea about what events are pending, and which have been delivered. This also means that calling these methods represent a synchronization point between user threads (these operations can only occur serially, not simultaneously).
- Default threadpool size has been changed from 5 to 1. This should only affect people who are using the Election code.
ZK::Client::Base#registerdelegates to its
event_handlerfor convenience (so you can write
zk.event_handler.register, which always irked me)
ZK::Client::Base#event_dispatch_thread?added to more easily allow users to tell if they're currently in the event thread (and possibly make decisions about the safety of their actions). This is now used by
block_until_node_deletedin the Unixisms module, and prevents a situation where the user could deadlock event delivery.
- Fixed issue 9, where using a Locker in the main thread would never awaken if the connection was dropped or interrupted. Now a
ZK::Exceptions::InterruptedSessionexception (or mixee) will be thrown to alert the caller that something bad happened.
ZK::Find.findnow returns the results in sorted order.
- Added documentation explaining the Pool class, reasons for using it, reasons why you shouldn't (added complexities around watchers and events).
- Began work on an experimental Multiplexed client, that would allow multithreaded clients to more effectively share a single connection by making all requests asynchronous behind the scenes, and using a queue to provide a synchronous (blocking) API.