Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP syslog writer should redial on connection lost (agent) #10

Open
umputun opened this issue Aug 4, 2019 · 1 comment
Open

TCP syslog writer should redial on connection lost (agent) #10

umputun opened this issue Aug 4, 2019 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@umputun
Copy link
Owner

umputun commented Aug 4, 2019

In case if dkll server failed/restarted and communication between agent and server is non-default tcp, agent writers can lose connection quietly. It has a repeater, but this one is on the dial stage only. and works on new container event.

umputun added a commit that referenced this issue Aug 4, 2019
@umputun umputun mentioned this issue Aug 11, 2019
@umputun umputun added the help wanted Extra attention is needed label Aug 12, 2019
@umputun
Copy link
Owner Author

umputun commented Aug 12, 2019

I have tried several approaches:

  1. Count on built-in retry mechanism from std syslog
  2. Wrap syslog's write and handle failed writes with retry
  3. Fail on write error and let docker restart the container
  4. Replace syslog by gsyslog supporting timeouts

All of these changes solved the problem just partially, in the best case. I didn't have much time to hunt it down, but if anyone wants to investigate it and propose a working solution - PRs are very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant