I started to notice these log messages happening in my logs from time to time:
[ERROR] (zkrb.c:834: errno: Bad file descriptor) select returned: -1
at a rate of hundreds per second.
In attempting to diagnose the problem, I deployed this additional error checking and found that it looks like we aren't handling failures from zookeeper_interest() properly.
When I ran the patch, I got this exception:
ext/c_zookeeper.rb:262:in `zkrb_iterate_event_loop': zookeeper_interest failed: -7: operation timeout (RuntimeError)
I'm not sure what the impact of having zookeeper_interest return ZOPERATIONTIMEOUT is, but it seems to me that it wouldn't be good.
I'm still investigating the correct behavior when we get a failure from zookeeper_interest().