You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should allow sources and sinks to run a health check on their dependencies at startup to avoid situations where we boot up and then immediately fail or go into a surprising retry loop. This could potentially tie into #66.
One thing to keep in mind is that there may be cases (e.g. intermittent connectivity issues or recovering after an incident) where starting up in spite of certain types of issues is desirable. A simple solution would be a flag to skip these checks, but a better (and more difficult) one would be some way of differentiating between errors that can be retried and those that are fatal.
The text was updated successfully, but these errors were encountered:
"but a better (and more difficult) one would be some way of differentiating between errors that can be retried and those that are fatal."
I don't think this is possible. For instance, a connection refused from a Splunk HEC sink could mean either that the HEC server is down (a temporary error) or that the URL for the HEC server in the config file is wrong (a permanent error). I think we'll want a subcommand that validates the config file and runs the healthchecks (but doesn't actually start the server) as a tool to use while getting router set up, but for a smoothly running router setup, the healthchecks are more informational, rather than affecting how the router operates.
Yeah, that's probably true. You'd have to assume everything is maybe permanent and then mark specific cases as retriable. And even that would probably be very limited because of cases like the one you mentioned where it could be either.
We should allow sources and sinks to run a health check on their dependencies at startup to avoid situations where we boot up and then immediately fail or go into a surprising retry loop. This could potentially tie into #66.
One thing to keep in mind is that there may be cases (e.g. intermittent connectivity issues or recovering after an incident) where starting up in spite of certain types of issues is desirable. A simple solution would be a flag to skip these checks, but a better (and more difficult) one would be some way of differentiating between errors that can be retried and those that are fatal.
The text was updated successfully, but these errors were encountered: