log writing failed. can't be called from trap context #598

GeminPatel · 2018-06-11T19:15:42Z

When a racecar consumer is running and I send a SIGTERM to it. It prints the following:

Jun 11 18:55:49 ip-10-212-33-14 systemd[1]: Stopping App client consumer. Instance=1....
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 systemd[1]: Stopped App client consumer. Instance=1..

Not always but sometimes.

I think this maybe because we can not use logger in signal handler https://bugs.ruby-lang.org/issues/7917, but ruby-kafka is doing this. If you follow trap signal handling by racecar you will finally land in a code in ruby-kafka that uses logger.info

I read this somewhere which seems real helpful:

How to resolve is, for example, use IO.pipe and puts into pipe in the signal handler, and treats the signal somewhere else as sidekiq does https://github.com/mperham/sidekiq/blob/b6de0bae6b86d5abae25852141768d8ecc8ddedf/lib/sidekiq/cli.rb#L53

Version of Ruby: 2.3.1
Version of Kafka: 0.11.0.1
Version of ruby-kafka: 0.5.2

The text was updated successfully, but these errors were encountered:

mensfeld · 2018-06-11T19:34:18Z

It's a common and known problem. You can bypass that by running logging inside of a thread for signal. This is dirty but works for 99.999% of cases.

mensfeld · 2018-06-11T19:34:56Z

Also, this is racecar related not ruby-kafka. Could you please move it here: https://github.com/zendesk/racecar

GeminPatel · 2018-06-11T19:41:55Z

@mensfeld Do you think it is happening due to some my logging? I think it is due to @logger.info "Disconnecting broker #{id}" in the broker_pool class

PS: I did only spend 10 mins looking into this issue as I was busy creating services. I will looking into this in more details as this thing hangs the termination quite a bit.

mensfeld · 2018-06-11T19:45:16Z

@GeminPatel yes but handling signals is not a part of ruby-kafka as it is "runtime environment" agnostic. It's a bug in racecar on how it handles trap context and what it does there with ruby-kafka client, not a ruby-kafka bug. Stacktrace not always points to the error source. I know that as I had similar problem with Karafka and managed to bypass that with spawning a new thread: https://github.com/karafka/karafka/blob/1.3-wip/lib/karafka/status.rb#L24

It does not always prevent a deadlock (theoretically) but is more than enough even for extensive instrumentation.

GeminPatel · 2018-06-11T19:50:40Z

@mensfeld: I created this here, because I thought that racecar is correctly using the APIs of ruby-kafka and because I thought that is is expected of the racecar [or some other wrapper on ruby-kafka] that eventually they will call this broker-pool close method in the trap. So I thought it is better that ruby-kafka handle this. But if you say I will create a ticket there as well. But they really cant 100% fix this as you said. But I also agree that ruby-kafka should stay agnostic. :(

mensfeld · 2018-06-11T19:54:35Z

I don't think this should be fixed in ruby-kafka. Hope @dasch does not disagree with me on that. As I said, ruby Kafka is not context aware and provides a different layers in the Kafka processing stack than racecar (as well as other frameworks). This happens due to the way racecar handles ruby-kafka within trap context and due to instrumentation and logging that is used. Nonetheless it's still a racecar issue that involves Kafka not the other way around.

GeminPatel · 2018-06-11T20:01:39Z

@mensfeld given that logging mode for production is almost always info. And that racecar can never fix this issue 100% on its own without monkey patching or rewriting some classes. I don't know what other way there is to be sure that this does not occur. But I think thread is an workable approach if we really do not want to change ruby-kafka.

mensfeld · 2018-06-11T20:04:33Z

Keep in mind, that the fact that trap occurs does not mean that the ruby-kafka client needs to receive the shutdown signal from within it. There are other patterns to handle that.

dasch · 2018-06-12T08:38:06Z

Yeah, I think Racecar should ensure that the calls to ruby-kafka don't happen from a trap context. Can you open an issue on Racecar?

GeminPatel · 2018-06-13T05:32:37Z

@dasch already did. I have ref it here as well.

0x2c7 · 2018-09-22T03:27:52Z

Since this issue is from Racecar, not directly related to ruby-kafka. I'll close this issue.

GeminPatel mentioned this issue Jun 11, 2018

log writing failed. can't be called from trap context zendesk/racecar#86

Closed

0x2c7 closed this as completed Sep 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log writing failed. can't be called from trap context #598

log writing failed. can't be called from trap context #598

GeminPatel commented Jun 11, 2018

mensfeld commented Jun 11, 2018

mensfeld commented Jun 11, 2018

GeminPatel commented Jun 11, 2018 •

edited

mensfeld commented Jun 11, 2018 •

edited

GeminPatel commented Jun 11, 2018 •

edited

mensfeld commented Jun 11, 2018

GeminPatel commented Jun 11, 2018

mensfeld commented Jun 11, 2018

dasch commented Jun 12, 2018

GeminPatel commented Jun 13, 2018

0x2c7 commented Sep 22, 2018

log writing failed. can't be called from trap context #598

log writing failed. can't be called from trap context #598

Comments

GeminPatel commented Jun 11, 2018

mensfeld commented Jun 11, 2018

mensfeld commented Jun 11, 2018

GeminPatel commented Jun 11, 2018 • edited

mensfeld commented Jun 11, 2018 • edited

GeminPatel commented Jun 11, 2018 • edited

mensfeld commented Jun 11, 2018

GeminPatel commented Jun 11, 2018

mensfeld commented Jun 11, 2018

dasch commented Jun 12, 2018

GeminPatel commented Jun 13, 2018

0x2c7 commented Sep 22, 2018

GeminPatel commented Jun 11, 2018 •

edited

mensfeld commented Jun 11, 2018 •

edited

GeminPatel commented Jun 11, 2018 •

edited