Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove observability for internal resources #9633

Merged
merged 14 commits into from Jan 30, 2024

Conversation

rtribotte
Copy link
Member

@rtribotte rtribotte commented Jan 2, 2023

What does this PR do?

This PR disables access logs, metrics, and tracing, for internal routers and services.
It also introduces an addInternals option for AccessLogs, Metrics, and Tracing, to enable the latter for internal resources.

Motivation

Fixes #9170
Fixes #6861

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

This PR drops an access logs test in pkg/server/router because it was redundant with actual integration tests in place (and would have required some work to be adapted due to PR changes).

Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rtribotte rtribotte force-pushed the remove-internals-observability branch from 6971539 to 8eb29ea Compare January 9, 2023 17:05
@netsandbox
Copy link

I'm not sure if unconditionally dropping observability for all @internal resources is the best way, because maybe there are use cases where you want or need this information.

For example if you secure the Prometheus metrics with BasicAuth and want to see the remote IP for the HTTP 403 errors, you need the information for the prometheus@internal service in the access log.
For the same use case you may also need the Prometheus metrics for prometheus@internal to monitor the HTTP 4xx errors.

@dhirschfeld
Copy link

I'm not sure if unconditionally dropping observability for all @internal resources is the best way, because maybe there are use cases where you want or need this information.

I'm just piping up as a user who'd really like the ability to turn off these logs as they just add noise and make debugging harder in the common case.

If you're trying to debug an issue with internal services then I guess not having any logs is equally as painful.

Perhaps the logging for internal services could be configurable so you could set it to e.g. only log errors rather than every successful ping request.

@rtribotte
Copy link
Member Author

rtribotte commented Apr 18, 2023

Hello @netsandbox @dhirschfeld,

Thanks for your feedback!

We did think about adding an option to get the observability back for internal resources when opening the PR.
But we also thought at that time that we wanted to gather first feedback to see the traction around it before adding yet another option.

This makes total sense and we are going to rework the PR to add the option.

@rtribotte
Copy link
Member Author

Hello @netsandbox @dhirschfeld,

We changed our mind about introducing the opt-in option to get back internal observability in this PR.
We want to address it in another PR and to have first the opportunity to gather more feedback on the need, to determine what would be the best approach for this option.
Would you be inclined to open a new issue to discuss it?

@jpds
Copy link

jpds commented May 16, 2023

This should really be implemented with conditional logging as nginx does: https://nginx.org/en/docs/http/ngx_http_log_module.html and also be off by default.

Speaking as someone with a systems security background - there's another class of software out there that hides what it's doing from operators by default, and that's malware. Here's the Prometheus metrics for my internal resources for the past 5 days:

Screenshot from 2023-05-16 10-40-08

From an audit perspective - there's an obvious baseline here for what's "normal" requests, but then there's a sudden increase in requests to a Traefik instance in the last hour - is that:

  1. Me just running curl against an internal endpoint
  2. A confused internal user accessing the wrong port?
  3. A misconfigured Prometheus instance?
  4. A attacker that's probing whatever endpoints they can (be it from a foothold they've gained into the internal network (if so, where did those requests come from?), or through a misconfigured firewall)?

Because without logs/metrics - you cannot even reason about any of the above.

@zetaab
Copy link
Contributor

zetaab commented Nov 7, 2023

@rtribotte @juliens what is status of this PR could it be merged? As I see it, it would be nice to ignore these internal services

@ldez
Copy link
Member

ldez commented Nov 7, 2023

to be merged a PR needs to be up to date, CI ok, and 3 approves

@rtribotte rtribotte modified the milestones: 3.0, next Dec 20, 2023
@rtribotte
Copy link
Member Author

Hello,

We decided to move the target of this PR to the next milestone, as we changed our minds about making it optional on another iteration.
The next step is to introduce an option to control the observability of the internal resources.

@rtribotte rtribotte changed the base branch from v3.0 to master December 20, 2023 13:02
@justinabrahms
Copy link

I need this change as traefik is producing too many useless spans (health checks) to be economical for us.

Aside from getting CI happy, what needs to be done before folks feel ready to approve? Is it at least directionally correct?

@rtribotte rtribotte force-pushed the remove-internals-observability branch from 8eb29ea to 182caff Compare January 22, 2024 10:42
@rtribotte rtribotte changed the base branch from master to v3.0 January 22, 2024 10:42
@nmengin nmengin added the priority/P1 need to be fixed in next release label Jan 30, 2024
Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@comminutus
Copy link

What version will this be included in next and how can one activate / use it?

@rtribotte rtribotte removed their assignment Feb 26, 2024
@rtribotte rtribotte deleted the remove-internals-observability branch March 15, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Stop Tracing Internal Traffic (ping@internal, prometheus@internal, etc.) Filtering out ping access logs