New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove observability for internal resources #9633
Remove observability for internal resources #9633
Conversation
dcdfaf4
to
7097e37
Compare
3a904dd
to
55018e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
6971539
to
8eb29ea
Compare
I'm not sure if unconditionally dropping observability for all @internal resources is the best way, because maybe there are use cases where you want or need this information. For example if you secure the Prometheus metrics with BasicAuth and want to see the remote IP for the HTTP 403 errors, you need the information for the |
I'm just piping up as a user who'd really like the ability to turn off these logs as they just add noise and make debugging harder in the common case. If you're trying to debug an issue with internal services then I guess not having any logs is equally as painful. Perhaps the logging for internal services could be configurable so you could set it to e.g. only log errors rather than every successful ping request. |
Hello @netsandbox @dhirschfeld, Thanks for your feedback! We did think about adding an option to get the observability back for internal resources when opening the PR. This makes total sense and we are going to rework the PR to add the option. |
Hello @netsandbox @dhirschfeld, We changed our mind about introducing the opt-in option to get back internal observability in this PR. |
This should really be implemented with conditional logging as nginx does: https://nginx.org/en/docs/http/ngx_http_log_module.html and also be off by default. Speaking as someone with a systems security background - there's another class of software out there that hides what it's doing from operators by default, and that's malware. Here's the Prometheus metrics for my internal resources for the past 5 days: From an audit perspective - there's an obvious baseline here for what's "normal" requests, but then there's a sudden increase in requests to a Traefik instance in the last hour - is that:
Because without logs/metrics - you cannot even reason about any of the above. |
@rtribotte @juliens what is status of this PR could it be merged? As I see it, it would be nice to ignore these internal services |
to be merged a PR needs to be up to date, CI ok, and 3 approves |
Hello, We decided to move the target of this PR to the next milestone, as we changed our minds about making it optional on another iteration. |
I need this change as traefik is producing too many useless spans (health checks) to be economical for us. Aside from getting CI happy, what needs to be done before folks feel ready to approve? Is it at least directionally correct? |
8eb29ea
to
182caff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
ceac983
to
512312b
Compare
What version will this be included in next and how can one activate / use it? |
What does this PR do?
This PR disables access logs, metrics, and tracing, for internal routers and services.
It also introduces an
addInternals
option for AccessLogs, Metrics, and Tracing, to enable the latter for internal resources.Motivation
Fixes #9170
Fixes #6861
More
Additional Notes
This PR drops an access logs test in
pkg/server/router
because it was redundant with actual integration tests in place (and would have required some work to be adapted due to PR changes).