Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock occurred when we restart elasticsearch 7 #587

Closed
wyukawa opened this issue May 13, 2019 · 3 comments
Closed

deadlock occurred when we restart elasticsearch 7 #587

wyukawa opened this issue May 13, 2019 · 3 comments

Comments

@wyukawa
Copy link
Contributor

wyukawa commented May 13, 2019

Problem

We send logs our elasticsearch with fluent-plugin-elasticsearch but deadlock error occurred when we restarted our elasticsearch.

I'm not sure this problem is related to fluent-plugin-elasticsearch or elasticsearch-ruby.

This error also occurred in fluent-plugin-elasticsearch 2.11.10.

At that time, I thought this was related to elasticsearch-ruby.
So I filed elastic/elasticsearch-ruby#656

At least, we don't have the problem when we use fluent-plugin-elasticsearch 2.11.10 and Elasticsearch 6.7.1

2019-05-13 14:32:19 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-05-13 14:32:20 +0900 chunk="588be3d16914130eecf307cf6625901d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:758:in `rescue in send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:733:in `send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:628:in `block in write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `each'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-05-13 14:32:45 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-05-13 14:32:46 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:45 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:46 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-05-13 14:32:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:46 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-05-13 14:32:52 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:52 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-05-13 14:33:00 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:52 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:00 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-05-13 14:33:17 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:00 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:17 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-05-13 14:33:51 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:17 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:51 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-05-13 14:34:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:51 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:34:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-05-13 14:36:54 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:34:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:36:54 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-05-13 14:41:29 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:36:54 +0900 [warn]: #0 suppressed same stacktrace

Steps to replicate

data flow is here
kafka -> kafka-fluentd-consumer -> fluend v1 -> fluent-plugin-elasticsearch -> elasticsearch

fluentd conf

  <match {...}.**>
    @type elasticsearch
    @id ...
    hosts ...
    logstash_format true
    logstash_prefix "${tag}"
    utc_index false
    include_tag_key true
    tag_key "tag_key"
    request_timeout 30s
    resurrect_after 0
    reload_on_failure true
    reload_connections false
    http_backend typhoeus
    type_name "_doc"
    bulk_message_request_threshold -1
    <buffer tag>
      @type "file"
      flush_at_shutdown true
      chunk_limit_size 8m
      queue_limit_length 512
      flush_interval 1s
      flush_thread_count 10
      overflow_action drop_oldest_chunk
    </buffer>
  </match>

Expected Behavior or What you need to ask

No deadlock

If deadlock occurs, fluent-plugin-elasticsearch retries but fails.
So buffer queue length increases.

Using Fluentd and ES plugin versions

  • OS version
    CentOS Linux release 7.6.1810 (Core)
  • Bare Metal or within Docker or Kubernetes or others?
    Bare Metal
  • Fluentd v0.12 or v0.14/v1.0
    1.4.2
  • ES plugin 3.x.y/2.x.y or 1.x.y
    3.5.0
2019-05-13 14:11:17 +0900 [info]: starting fluentd-1.4.2 pid=105220 ruby="2.6.3"
...
2019-05-13 14:11:17 +0900 [info]: gem 'fluentd' version '1.4.2'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-ping-message' version '1.0.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-suppress' version '1.1.0'
...

Gemfile

source 'https://rubygems.org'

gem "fluentd", "1.4.2"
gem "oj", "3.7.12"

gem "fluent-plugin-ping-message", "1.0.0"
gem "fluent-plugin-record-reformer", "0.9.1"
gem "fluent-plugin-suppress", "1.1.0"
gem 'fluent-plugin-elasticsearch', :git => 'https://github.com/uken/fluent-plugin-elasticsearch.git', :branch => " configurable-split_request_p"
gem "typhoeus", "1.3.1"
gem "fluent-plugin-prometheus", "1.3.0"
  • ES version (optional)
    7.0.1
  • ES template(s) (optional)
@wyukawa
Copy link
Contributor Author

wyukawa commented Jun 20, 2019

This problem doesn't happen in fluent-plugin-elasticsearch 3.5.2, elasticsearch ruby 7.1.0, Elasticsearch 7.1.1

@wyukawa wyukawa closed this as completed Jun 21, 2019
@Z0T0
Copy link

Z0T0 commented Jul 16, 2019

Please increase 'chunk_limit_size'

@cosmo0920
Copy link
Collaborator

@Z0T0 You shouldn't write your issue report in non-relevant issue.
Please open a new issue instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants