Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More detailed metrics #156

Closed
keis opened this issue Jan 8, 2016 · 11 comments
Closed

More detailed metrics #156

keis opened this issue Jan 8, 2016 · 11 comments
Labels

Comments

@keis
Copy link

keis commented Jan 8, 2016

It would be great if traefik could provide some more detailed metrics, e.g

  • Response time quantile to avoid skew from one or two outliers. (like how prometheus does it)
  • Breakdown by backend
    • request count
    • response time
@emilevauge emilevauge added the kind/enhancement a new or improved feature. label Jan 8, 2016
@thawkins
Copy link

Definatly +1 on this....

I would also add response status to that list.

@keis
Copy link
Author

keis commented May 23, 2016

Most of the heavy lifting for this is done by the logger middleware now. It should be pretty simple to add some prometheus counters to that. Not sure if that's the best place for doing it or if prometheus is the right tool. I do quite like it though :)

@samber
Copy link
Contributor

samber commented May 25, 2016

I think most interesting metrics are:

  • traefik build id
  • per entrypoint:
    • Number of active connections
  • per backend type (consul catalog, file, etcd, kubernetes...):
    • Number of successful backend updates
    • Number of failed backend updates
    • number catched updates
    • number effective reload
  • per service:
    • circuit breaker:
      • number of backends available
      • number failing backends
    • frontend:
      • response time (including Traefik processing)
      • number successful requests (1xx, 2xx, 3xx)
      • number failing requests (4xx, 5xx)
      • request size
      • response size
    • per backend:
      • response time (excluding Traefik processing)
      • number successful requests (1xx, 2xx, 3xx)
      • number failing requests (4xx, 5xx)
      • retry count
      • circuit breaker:
        • is available (bool)
        • expression (example: NetworkErrorRatio() > 0.5)
        • number network error

Prometheus is probably the more "trendly" time series db. IMO, that's why we should integrate a /metrics route in the web provider, using the Prometheus data format.

Prometheus can aggregate backends stats to get avg or sum of each metrics, then adding "total" counts would only be used for /healthrequests.

@thawkins
Copy link

thawkins commented May 25, 2016

+1 on prometheus integration, prometheus will be out of the box on the next release of kubenetes. Having good integration would be excellent.

@samber
Copy link
Contributor

samber commented May 31, 2016

Last week, I made a standalone Prometheus exporter: https://github.com/iadvize/traefik-exporter

I will combine that work with the Traefik binary very soon.

As an example of limitation of using an external exporter: we cannot use Prometheus summaries to mesure the duration of 1% slowest requests.

@therc
Copy link

therc commented Jun 2, 2016

It would be great to also have the lists and rates by top N client IPs or some other fields defined in the configuration (top Host:, Authentication: headers). Maybe even have an endpoint to add IPs and headers to track manually, even if they're not in the top N. This would allow user to run a DDOS detection/mitigation system or a throttler alongside Traefik. I doubt most people will want either of those built-in or that it would meet all their needs.

@brian-brazil
Copy link

Top N isn't really well suited to Prometheus, as it'll cause too much label churn - particularly on this type of system which may be talked to by the entire internet.

All other mentioned use cases are perfect for Prometheus, maybe add last update time too? The exposition format is open and there's parsers currently available in Go and Python so it's possible to integrate into other monitoring systems too.

@tlvenn
Copy link

tlvenn commented Nov 5, 2016

HAProxy has a pretty useful dashboard when it comes to metrics, probably a good path to follow

http://demo.haproxy.org/

Would it very nice to have something similar out of the box with Traefik.

@arehmandev
Copy link

arehmandev commented Jan 17, 2017

Looks like metrics has now been implementedt:

175659a

@enxebre
Copy link
Contributor

enxebre commented Jan 19, 2017

this is partially covered now by #1022 and #1042. Which should serve as a foundation for adding more detailed metrics.

@timoreimann
Copy link
Contributor

I'm going to close this issue as basic support has shipped.

The list in #156 (comment) and @brian-brazil's suggestion in #156 (comment) regarding uptime should provide as a great basis for future additions. Please file dedicated issues for additional metrics.

@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests