Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log writing failed. can't be called from trap context #598

Closed
GeminPatel opened this issue Jun 11, 2018 · 11 comments
Closed

log writing failed. can't be called from trap context #598

GeminPatel opened this issue Jun 11, 2018 · 11 comments

Comments

@GeminPatel
Copy link

When a racecar consumer is running and I send a SIGTERM to it. It prints the following:

Jun 11 18:55:49 ip-10-212-33-14 systemd[1]: Stopping App client consumer. Instance=1....
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 bash[9370]: log writing failed. can't be called from trap context
Jun 11 18:55:49 ip-10-212-33-14 systemd[1]: Stopped App client consumer. Instance=1..

Not always but sometimes.

I think this maybe because we can not use logger in signal handler https://bugs.ruby-lang.org/issues/7917, but ruby-kafka is doing this. If you follow trap signal handling by racecar you will finally land in a code in ruby-kafka that uses logger.info

I read this somewhere which seems real helpful:

How to resolve is, for example, use IO.pipe and puts into pipe in the signal handler, and treats the signal somewhere else as sidekiq does https://github.com/mperham/sidekiq/blob/b6de0bae6b86d5abae25852141768d8ecc8ddedf/lib/sidekiq/cli.rb#L53

  • Version of Ruby: 2.3.1
  • Version of Kafka: 0.11.0.1
  • Version of ruby-kafka: 0.5.2
@mensfeld
Copy link
Contributor

It's a common and known problem. You can bypass that by running logging inside of a thread for signal. This is dirty but works for 99.999% of cases.

@mensfeld
Copy link
Contributor

Also, this is racecar related not ruby-kafka. Could you please move it here: https://github.com/zendesk/racecar

@GeminPatel
Copy link
Author

GeminPatel commented Jun 11, 2018

@mensfeld Do you think it is happening due to some my logging? I think it is due to @logger.info "Disconnecting broker #{id}" in the broker_pool class

PS: I did only spend 10 mins looking into this issue as I was busy creating services. I will looking into this in more details as this thing hangs the termination quite a bit.

@mensfeld
Copy link
Contributor

mensfeld commented Jun 11, 2018

@GeminPatel yes but handling signals is not a part of ruby-kafka as it is "runtime environment" agnostic. It's a bug in racecar on how it handles trap context and what it does there with ruby-kafka client, not a ruby-kafka bug. Stacktrace not always points to the error source. I know that as I had similar problem with Karafka and managed to bypass that with spawning a new thread: https://github.com/karafka/karafka/blob/1.3-wip/lib/karafka/status.rb#L24

It does not always prevent a deadlock (theoretically) but is more than enough even for extensive instrumentation.

@GeminPatel
Copy link
Author

GeminPatel commented Jun 11, 2018

@mensfeld: I created this here, because I thought that racecar is correctly using the APIs of ruby-kafka and because I thought that is is expected of the racecar [or some other wrapper on ruby-kafka] that eventually they will call this broker-pool close method in the trap. So I thought it is better that ruby-kafka handle this. But if you say I will create a ticket there as well. But they really cant 100% fix this as you said. But I also agree that ruby-kafka should stay agnostic. :(

@mensfeld
Copy link
Contributor

I don't think this should be fixed in ruby-kafka. Hope @dasch does not disagree with me on that. As I said, ruby Kafka is not context aware and provides a different layers in the Kafka processing stack than racecar (as well as other frameworks). This happens due to the way racecar handles ruby-kafka within trap context and due to instrumentation and logging that is used. Nonetheless it's still a racecar issue that involves Kafka not the other way around.

@GeminPatel
Copy link
Author

@mensfeld given that logging mode for production is almost always info. And that racecar can never fix this issue 100% on its own without monkey patching or rewriting some classes. I don't know what other way there is to be sure that this does not occur. But I think thread is an workable approach if we really do not want to change ruby-kafka.

@mensfeld
Copy link
Contributor

Keep in mind, that the fact that trap occurs does not mean that the ruby-kafka client needs to receive the shutdown signal from within it. There are other patterns to handle that.

@dasch
Copy link
Collaborator

dasch commented Jun 12, 2018

Yeah, I think Racecar should ensure that the calls to ruby-kafka don't happen from a trap context. Can you open an issue on Racecar?

@GeminPatel
Copy link
Author

@dasch already did. I have ref it here as well.

@0x2c7
Copy link
Contributor

0x2c7 commented Sep 22, 2018

Since this issue is from Racecar, not directly related to ruby-kafka. I'll close this issue.

@0x2c7 0x2c7 closed this as completed Sep 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants