Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup checks for sources and sinks #67

Closed
lukesteensen opened this issue Jan 4, 2019 · 2 comments
Closed

Startup checks for sources and sinks #67

lukesteensen opened this issue Jan 4, 2019 · 2 comments
Assignees

Comments

@lukesteensen
Copy link
Member

We should allow sources and sinks to run a health check on their dependencies at startup to avoid situations where we boot up and then immediately fail or go into a surprising retry loop. This could potentially tie into #66.

One thing to keep in mind is that there may be cases (e.g. intermittent connectivity issues or recovering after an incident) where starting up in spite of certain types of issues is desirable. A simple solution would be a flag to skip these checks, but a better (and more difficult) one would be some way of differentiating between errors that can be retried and those that are fatal.

@michaelfairley
Copy link
Contributor

michaelfairley commented Jan 15, 2019

"but a better (and more difficult) one would be some way of differentiating between errors that can be retried and those that are fatal."

I don't think this is possible. For instance, a connection refused from a Splunk HEC sink could mean either that the HEC server is down (a temporary error) or that the URL for the HEC server in the config file is wrong (a permanent error). I think we'll want a subcommand that validates the config file and runs the healthchecks (but doesn't actually start the server) as a tool to use while getting router set up, but for a smoothly running router setup, the healthchecks are more informational, rather than affecting how the router operates.

@lukesteensen
Copy link
Member Author

Yeah, that's probably true. You'd have to assume everything is maybe permanent and then mark specific cases as retriable. And even that would probably be very limited because of cases like the one you mentioned where it could be either.

@binarylogic binarylogic added this to the 0.1 milestone Mar 20, 2019
syedriko referenced this issue in syedriko/vector Jun 13, 2022
Fix PROTOC env variable value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants