Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd stopped sending data to ES for somewhile. #525

Closed
hustshawn opened this issue Jan 10, 2019 · 26 comments
Closed

Fluentd stopped sending data to ES for somewhile. #525

hustshawn opened this issue Jan 10, 2019 · 26 comments

Comments

@hustshawn
Copy link

@hustshawn hustshawn commented Jan 10, 2019

Problem

I used the fluentd with your plugin to collect logs from docker containers and send to ES. It works at the very begining. But later, the ES unable to recieve the logs from fluentd. The ES is always running fine. And I find there is no indices of the new day(eg. fluentd-20190110, only the old indice 20190109 exist) in the ES.

However, if I restart my docker containers with fluentd, it can start sending logs to ES.
image

...

Steps to replicate

The fluentd config

# fluentd/conf/fluent.conf
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<match *.**>
  @type copy
  <store>
    @type elasticsearch
    host my-es-host
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 5s
  </store>
  <store>
    @type stdout
  </store>
</match>

Expected Behavior or What you need to ask

The fluentd should keep sending logs to ES.

Using Fluentd and ES plugin versions

  • OS version
  • Bare Metal or within Docker or Kubernetes or others?
    Docker
  • Fluentd v0.12 or v0.14/v1.0
    • paste result of fluentd --version or td-agent --version
      v1.3.2-1.0
  • ES plugin 2.x.y or 1.x.y
    • paste boot log of fluentd or td-agent
    • paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock
  • ES version (optional)
    6.5.4
@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Jan 10, 2019

Could you provide your Fluentd docker log?

<match *.**>

The above settings is very dangerous.
This blackhole pattern causes flood of declined log:
https://github.com/uken/fluent-plugin-elasticsearch#declined-logs-are-resubmitted-forever-why

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Jan 10, 2019

Hi @cosmo0920 ,
The fluentd logs are looks like below

fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: Detected ES 6.x: ES 7.x will only accept `_doc` in type_name.
fluentd_1        | 2019-01-09 03:15:52 +0000 [warn]: To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: using configuration file: <ROOT>
fluentd_1        |   <source>
fluentd_1        |     @type forward
fluentd_1        |     port 24224
fluentd_1        |     bind "0.0.0.0"
fluentd_1        |   </source>
fluentd_1        |   <match *.**>
fluentd_1        |     @type copy
fluentd_1        |     <store>
fluentd_1        |       @type "elasticsearch"
fluentd_1        |       host my-es-host
fluentd_1        |       port 9200
fluentd_1        |       logstash_format true
fluentd_1        |       logstash_prefix "fluentd"
fluentd_1        |       logstash_dateformat "%Y%m%d"
fluentd_1        |       include_tag_key true
fluentd_1        |       type_name "access_log"
fluentd_1        |       tag_key "@log_name"
fluentd_1        |       flush_interval 1s
fluentd_1        |       <buffer>
fluentd_1        |         flush_interval 1s
fluentd_1        |       </buffer>
fluentd_1        |     </store>
fluentd_1        |     <store>
fluentd_1        |       @type "stdout"
fluentd_1        |     </store>
fluentd_1        |   </match>
fluentd_1        | </ROOT>
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: starting fluentd-1.3.2 pid=5 ruby="2.5.2"
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.0.1'
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: gem 'fluentd' version '1.3.2'
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: adding match pattern="*.**" type="copy"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept `_doc` in type_name.
fluentd_1        | 2019-01-09 03:15:53 +0000 [warn]: #0 To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: adding source type="forward"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 starting fluentd worker pid=13 ppid=5 worker=0
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 listening port port=24224 bind="0.0.0.0"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 fluentd worker is now running worker=0
fluentd_1        | 2019-01-09 03:15:53.601732394 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}

....

@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Jan 10, 2019

Umm..., could you share fluentd error log between from 2019-01-10 2:00 to 2019-01-10 11:00 ?

Shared log is booting log. It just says that Fluentd was launched normally.

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Jan 10, 2019

@cosmo0920 I find something like this

[fluentd_1        |[0m 2019-01-10 02:16:45 +0000 [warn]: #0 failed to flush the buffer. retry_time=15 next_retry_seconds=2019-01-10 07:21:51 +0000 chunk="57f0d689aeefe7b1ef1da592fed4d444" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"my-es-host\", :port=>9200, :scheme=>\"http\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)"
[fluentd_1        |[0m   2019-01-10 02:16:45 +0000 [warn]: #0 suppressed same stacktrace
[fluentd_1        |[0m 2019-01-10 02:16:45.424613201 +0000 fluent.warn: {"retry_time":15,"next_retry_seconds":"2019-01-10 07:21:51 +0000","chunk":"57f0d689aeefe7b1ef1da592fed4d444","error":"#<Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure: could not push logs to Elasticsearch cluster ({:host=>\"my-es-host\", :port=>9200, :scheme=>\"http\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)>","message":"failed to flush the buffer. retry_time=15 next_retry_seconds=2019-01-10 07:21:51 +0000 chunk=\"57f0d689aeefe7b1ef1da592fed4d444\" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error=\"could not push logs to Elasticsearch cluster ({:host=>\\\"my-es-host\\\", :port=>9200, :scheme=>\\\"http\\\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)\""}
@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Jan 10, 2019

It seems that ES plugin cannot push events due to ECONNREFUSED.
This error is from network stack.
Could you check docker networking settings or ES side log?

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Jan 10, 2019

@cosmo0920 My ES is setup with AWS EC2, and the networking should be fine, without disconnect or DNS issue.
I also find some extra logs just above previous logs.

^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:645:in `rescue in send_bulk'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:627:in `send_bulk'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:534:in `block in write'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `each'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `write'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:1123:in `try_flush'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:1423:in `flush_thread_run'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:452:in `block (2 levels) in start'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Jan 10, 2019

@cosmo0920 Here is more logs from ES

elasticsearch_1  | [2019-01-10T04:41:01,689][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,689][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,795][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,795][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,823][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,823][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,833][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,833][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,835][INFO ][o.e.c.r.a.AllocationService] [-utwWeF] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[fluentd-20190108][2]] ...]).
elasticsearch_1  | [2019-01-10T04:41:01,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,847][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:08,712][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T04:41:08,724][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:08,724][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,832][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T06:18:09,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,859][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T06:18:09,867][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,868][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata

and actually, I have two nodes/host with same configuration that collect logs from my application server, do you think that should be a concern for this issue?

If it is true, is there any other way in the fluentd configuration to distinguish the logs collected from which node? like hostname or host ip as the metadata?

@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Jan 11, 2019

do you think that should be a concern for this issue?

It should check Docker networking.
Bare metal environment might not cause networking issue.
Here is the another case due to docker networking: #416

The above issue is also only occurred within docker not bare metal environment.

If it is true, is there any other way in the fluentd configuration to distinguish the logs collected from which node? like hostname or host ip as the metadata?

in_forward has the option adds hostname:
https://docs.fluentd.org/v1.0/articles/in_forward#source_hostname_key

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Jan 14, 2019

@cosmo0920 Thanks for you for your advice. But I have to use the fluentd in docker, and it looks like the issue still there. The services in my docker is always running well. It probably not the docker networking issue.

@emmayang

This comment has been minimized.

Copy link

@emmayang emmayang commented Feb 12, 2019

Met similar issue, but I have the fluend deployed as a daemonset under kube-system namespace.

And I can confirm ES is running well all the time, since fluentd is only one of my logging sources, and other sources can work well and showing logs correctly in ES.

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented Feb 13, 2019

@emmayang Same issue on my kube platform.

@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Feb 13, 2019

Hmmm..., could you try typhoeus backend instead of excon?
typhoeus can handle keep-alive by default.
https://github.com/uken/fluent-plugin-elasticsearch#http_backend

@twittyc

This comment has been minimized.

Copy link

@twittyc twittyc commented Feb 18, 2019

I'm also seeing this same issue when running fluentd with ES plugin in Kubernetes. I tried both backends and typhoeus didn't work at all, while the default backend would work on initial connection (fresh deploy) and then stop sending data almost immediately.

EDIT: I believe my issues were not from the ES plugin but performance tuning that I needed to do on Fluentd.

@aaron1989041

This comment has been minimized.

Copy link

@aaron1989041 aaron1989041 commented Mar 19, 2019

I have similar problems.I also have huge number for warnings as below:
"failed to flush the buffer. retry_time=0 next_retry_seconds=2019-03-19 01:30:36 +0000 chunk="584686c3d47849db61228ea7e6f29bb5" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"es-cn-v0h10rbfl000kfon8..com\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): connect_write timeout reached""
when this error happens ,the only way is to restart the fluentd container.but then log gap happens.

@ChSch3000

This comment has been minimized.

Copy link

@ChSch3000 ChSch3000 commented Mar 19, 2019

Same problem here. I'm using fluentd-kubernetes-daemonset.
Already opened an Issue here
fluent/fluentd-kubernetes-daemonset#280
After deployment the plugin works fine and ships all logs to ES. But after a few hours the plugin stops with following error:

2019-03-19 08:24:32 +0000 : #0 [out_es] failed to flush the buffer. retry_time=2810 next_retry_seconds=2019-03-19 08:25:05 +0000 chunk="5846b2b0d6d06c398eee3540256d465d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableReque │
│ stFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elastic.xyz.com\", :port=>443, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): connect_write timeout reached"

Only solution is to restart the pod. But this isnt' an acceptable solution,

@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Mar 19, 2019

Set reload_connections as false can help this issue?
I launched docker-compose environment with fluent/fluentd#2334 (comment) settings but I didn't reproduce in my local environment.
To reproduce this issue, we should handle a massive events?

@bidiudiu

This comment has been minimized.

Copy link

@bidiudiu bidiudiu commented Mar 20, 2019

Set reload_connections as false can help this issue?
I launched docker-compose environment with fluent/fluentd#2334 (comment) settings but I didn't reproduce in my local environment.
To reproduce this issue, we should handle a massive events?

@cosmo0920, I'm afraid so...In my case, the hits reach 100000+ then the issue happens.
image

In fluentd, here's error info:

2019-03-20 02:07:53 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2019-03-20 02:07:54 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f880ef7f118"
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:249:in perform_request' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/faraday.rb:20:in perform_request'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in perform_request' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/bulk.rb:90:in bulk'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:353:in send_bulk' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:339:in write_objects'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:490:in write' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/buffer.rb:354:in write_chunk'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/buffer.rb:333:in pop' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:342:in try_flush'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:149:in `run'

I'll try 'reconnect_on_error true' and give feedback.

@ChSch3000

This comment has been minimized.

Copy link

@ChSch3000 ChSch3000 commented Mar 20, 2019

Set reload_connections as false can help this issue?
I launched docker-compose environment with fluent/fluentd#2334 (comment) settings but I didn't reproduce in my local environment.
To reproduce this issue, we should handle a massive events?

Maybe this is the solution for me. Set reload_connection to false, now it's working for about 18h without troubles. I will monitor it for the next few hours / days.

@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Mar 20, 2019

@bidiudiu @ChSch3000 Thank you for your issue confirmations and clarifications!

fluentd-kubernates-daemonset provides the following environment variable:

  • FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS (default: true)

This should be specified:

  • FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS=false
@cosmo0920 cosmo0920 mentioned this issue Mar 20, 2019
4 of 7 tasks complete
@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Mar 20, 2019

I've added FAQ for this situation. #564

Any lack of information to solve this issue?

@bidiudiu

This comment has been minimized.

Copy link

@bidiudiu bidiudiu commented Mar 22, 2019

Thanks @cosmo0920. I add settings below and it works fine:

reconnect_on_error true
  reload_on_failure true
  reload_connections false
@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented Mar 22, 2019

reconnect_on_error true
reload_on_failure true
reload_connections false

OK. Thanks for confirming, @bidiudiu !
I'll add more descriptions for this issue into FAQ.

cosmo0920 added a commit to cosmo0920/fluentd-kubernetes-daemonset that referenced this issue Mar 22, 2019
This is reported in
uken/fluent-plugin-elasticsearch#525.

Invalid sniffer information is obtained by default, but we can avoid
the following configuration:

```aconf
reload_connections false
reconnect_on_error true
reload_on_failure true
```

To specify reload_on_failure on fluentd-kubernetes-daemonset,
we should introduce a new envver to specify it.

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
cosmo0920 added a commit to cosmo0920/fluentd-kubernetes-daemonset that referenced this issue Mar 22, 2019
…e sniffering

This is reported in
uken/fluent-plugin-elasticsearch#525.

Invalid sniffer information is obtained by default, but we can avoid
the following configuration:

```aconf
reload_connections false
reconnect_on_error true
reload_on_failure true
```

To specify reload_on_failure on fluentd-kubernetes-daemonset,
we should introduce a new envver to specify it.

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
@cosmo0920 cosmo0920 mentioned this issue Mar 22, 2019
4 of 7 tasks complete
cosmo0920 added a commit to cosmo0920/fluentd-kubernetes-daemonset that referenced this issue Apr 12, 2019
fluent-plugin-elasticsearch reloads connection after 10000 requests. (Not correspond to events counts because ES plugin uses bulk API.)

This functionality which is originated from elasticsearch-ruby gem is enabled by default.

Sometimes this reloading functionality bothers users to send events with ES plugin.

On k8s platform, users sometimes shall specify the following settings:

```aconf
reload_connections false
reconnect_on_error true
reload_on_failure true
```

This is originally reported at
uken/fluent-plugin-elasticsearch#525.

On k8s, Fluentd sometimes handles flood of events.
This is a pitfall to use fluent-plugin-elasticsearch on k8s.
So, this parameter set should be default.

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
@dogzzdogzz

This comment has been minimized.

Copy link

@dogzzdogzz dogzzdogzz commented May 9, 2019

Can we change the default value of those settings for fluentd-kubernetes-daemonset ? I think everyone who uses fluentd-kubernetes-daemonset will encounter this issue easily ?

@hustshawn

This comment has been minimized.

Copy link
Author

@hustshawn hustshawn commented May 9, 2019

@dogzzdogzz if you are using helm to install, eg. helm upgrade --install logging-fluentd -f your-values.yml kiwigrid/fluentd-elasticsearch --namespace your-namespace, you can just modify the fluentd config in your-values.yml.

Part of my snippet looks like this,

  output.conf: |
    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      include_tag_key true
      type_name _doc
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      scheme "#{ENV['OUTPUT_SCHEME']}"
      ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
      logstash_format true
      logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
      reload_connections false
      reconnect_on_error true
      reload_on_failure true
      slow_flush_log_threshold 25.0
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        flush_interval 5s
        flush_thread_count 4
        chunk_full_threshold 0.9
        # retry_forever
        retry_type exponential_backoff
        retry_timeout 1m
        retry_max_interval 30
        chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action drop_oldest_chunk
      </buffer>
    </match>
@cosmo0920

This comment has been minimized.

Copy link
Collaborator

@cosmo0920 cosmo0920 commented May 9, 2019

@dogzzdogzz The latest fluentd-kubernetes-daemonset includes the above settings by default.

zirkome added a commit to auth0/fluentd-kubernetes-daemonset that referenced this issue Jun 3, 2019
fluent-plugin-elasticsearch reloads connection after 10000 requests. (Not correspond to events counts because ES plugin uses bulk API.)

This functionality which is originated from elasticsearch-ruby gem is enabled by default.

Sometimes this reloading functionality bothers users to send events with ES plugin.

On k8s platform, users sometimes shall specify the following settings:

```aconf
reload_connections false
reconnect_on_error true
reload_on_failure true
```

This is originally reported at
uken/fluent-plugin-elasticsearch#525.

On k8s, Fluentd sometimes handles flood of events.
This is a pitfall to use fluent-plugin-elasticsearch on k8s.
So, this parameter set should be default.

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
@darthchudi

This comment has been minimized.

Copy link

@darthchudi darthchudi commented Jul 1, 2019

Tried using the exact same config as #525 (comment) but the issue still persists. Fluentd stops shipping logs to Elasticsearch after some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.