Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous request method drives /metrics unreadable by Prometheus #1821

Closed
ajardan opened this issue Jul 4, 2017 · 20 comments
Closed

Erroneous request method drives /metrics unreadable by Prometheus #1821

ajardan opened this issue Jul 4, 2017 · 20 comments
Assignees
Milestone

Comments

@ajardan
Copy link
Contributor

ajardan commented Jul 4, 2017

Do you want to request a feature or report a bug?

bug

What did you do?

Collected Prometheus stats from traefik /metrics page

What did you expect to see?

Metrics being added to Prometheus

What did you see instead?

None of the metrics were collected by Prometheus, because of erroneous request method, which contained unreadable characters

Output of traefik version: (What version of Traefik are you using?)

# docker run traefik version
Version:      v1.3.1
Codename:     raclette
Go version:   go1.8.3
Built:        2017-06-16_11:21:48AM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

                "-c",
                "/dev/null",
                "--docker",
                "--docker.watch",
                "--docker.swarmmode",
                "--docker.domain=tld.com",
                "--web",
                "--web.address=:9080",
                "--web.statistics",
                "--web.metrics",
                "--web.metrics.prometheus",
                "--entryPoints=Name:http Address::80",
                "--entryPoints=Name:https Address::443 TLS:/certs/cert.pem,/keys/cert.key"

Here is what /metrics bogus entries look like:

traefik_requests_total{code="500",method="POST",service="http"} 6
traefik_requests_total{code="500",method="POST",service="https"} 20
traefik_requests_total{code="500",method="Ý̪î����rgda����������������������������������������ù������������������������������������������������������������������������������������������������������������������������������������������PY£Ÿx������������������������������������������������������������������������������������������������������������������������com.apple.WebKit����GET",service="backend-api-rt"} 1
traefik_requests_total{code="500",method="Ý̪î����rgda����������������������������������������ù������������������������������������������������������������������������������������������������������������������������������������������PY£Ÿx������������������������������������������������������������������������������������������������������������������������com.apple.WebKit����GET",service="http"} 1
@timoreimann
Copy link
Contributor

@ldez is this possibly our compression bug again?

@ldez ldez added area/middleware/metrics kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. and removed area/middleware/metrics labels Jul 4, 2017
@ldez
Copy link
Member

ldez commented Jul 4, 2017

Compression is only activate when compress is true on an entrypoint.

[entryPoints]
   [entryPoints.http]
   compress = true

@ajardan Do you have activate compression?

@ldez
Copy link
Member

ldez commented Jul 4, 2017

@ajardan are you redirect HTTP to HTTPS?

Could do a curl -vv <traefik_url>/metrics and copy paste the result?
Could you provide debug logs? (traefik --debug)

@ajardan
Copy link
Contributor Author

ajardan commented Jul 4, 2017

@ldez As far as I can see, we don't have compression enabled. Unless it is enabled by default, of course. We are running traffic as a docker swarm service with the params I specified, no config file.

And unfortunately, I don't think I can provide a debug output, since this is in production. Also if I restart the service, I don't know when the issue reappears, could be tomorrow, could be in a month...

If there's any other way I can gather useful info, I would be happy to.

Here is the curl output:

*   Trying x.x.x.x...
* TCP_NODELAY set
* Connected to tld.com (x.x.x.x) port 9080 (#0)
> GET /metrics HTTP/1.1
> Host: tld.com:9080
> User-Agent: curl/7.51.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 19916
< Content-Type: text/plain; version=0.0.4
< Date: Tue, 04 Jul 2017 13:45:33 GMT
< 
{ [3798 bytes data]

#############                                                             19.1%
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000231656
go_gc_duration_seconds{quantile="0.25"} 0.000305328
go_gc_duration_seconds{quantile="0.5"} 0.000342743
go_gc_duration_seconds{quantile="0.75"} 0.000395275
go_gc_duration_seconds{quantile="1"} 0.013661905
go_gc_duration_seconds_sum 33.403716162
go_gc_duration_seconds_count 86644
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 18002
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.53540936e+08
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.3719760947912e+13
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 2.553746e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 4.7876697154e+10
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.001693048349307275
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.1971328e+07
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.53540936e+08
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 3.56098048e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 3.72989952e+08
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 1.095528e+06
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 3.28761344e+08
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 7.29088e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.4991759267996163e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 1.99650952e+08
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 4.7877792682e+10
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 28800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 32768
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 6.275624e+06
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 9.66656e+06
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 3.81402224e+08
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.0542438e+07
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 1.29695744e+08
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 1.29695744e+08
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 9.13550584e+08
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 52
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 286380.89
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65536
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 15518
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5.97991424e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.49864953371e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 9.58619648e+08
# HELP traefik_request_duration_seconds How long it took to process the request.
# TYPE traefik_request_duration_seconds histogram
traefik_request_duration_seconds_bucket{service="backend-api-bulk",le="0.1"} 6.941684e+06
traefik_request_duration_seconds_bucket{service="backend-api-bulk",le="0.3"} 7.030495e+06
traefik_request_duration_seconds_bucket{service="backend-api-bulk",le="1.2"} 7.079315e+06
traefik_request_duration_seconds_bucket{service="backend-api-bulk",le="5"} 7.110063e+06
traefik_request_duration_seconds_bucket{service="backend-api-bulk",le="+Inf"} 7.19634e+06
traefik_request_duration_seconds_sum{service="backend-api-bulk"} 1.1021058753168425e+06
traefik_request_duration_seconds_count{service="backend-api-bulk"} 7.19634e+06
traefik_request_duration_seconds_bucket{service="backend-api-crystal-ball",le="0.1"} 9
traefik_request_duration_seconds_bucket{service="backend-api-crystal-ball",le="0.3"} 54
traefik_request_duration_seconds_bucket{service="backend-api-crystal-ball",le="1.2"} 63
traefik_request_duration_seconds_bucket{service="backend-api-crystal-ball",le="5"} 63
traefik_request_duration_seconds_bucket{service="backend-api-crystal-ball",le="+Inf"} 63
traefik_request_duration_seconds_sum{service="backend-api-crystal-ball"} 14.359865245
traefik_request_duration_seconds_count{service="backend-api-crystal-ball"} 63
traefik_request_duration_seconds_bucket{service="backend-api-mobile",le="0.1"} 2.00409763e+08
traefik_request_duration_seconds_bucket{service="backend-api-mobile",le="0.3"} 2.00409858e+08
traefik_request_duration_seconds_bucket{service="backend-api-mobile",le="1.2"} 2.00409859e+08
traefik_request_duration_seconds_bucket{service="backend-api-mobile",le="5"} 2.00409859e+08
traefik_request_duration_seconds_bucket{service="backend-api-mobile",le="+Inf"} 2.00409859e+08
traefik_request_duration_seconds_sum{service="backend-api-mobile"} 241223.63495567077
traefik_request_duration_seconds_count{service="backend-api-mobile"} 2.00409859e+08
traefik_request_duration_seconds_bucket{service="backend-api-peephole",le="0.1"} 0
traefik_request_duration_seconds_bucket{service="backend-api-peephole",le="0.3"} 5
traefik_request_duration_seconds_bucket{service="backend-api-peephole",le="1.2"} 10
traefik_request_duration_seconds_bucket{service="backend-api-peephole",le="5"} 10
traefik_request_duration_seconds_bucket{service="backend-api-peephole",le="+Inf"} 10
traefik_request_duration_seconds_sum{service="backend-api-peephole"} 3.348792028
traefik_request_duration_seconds_count{service="backend-api-peephole"} 10
traefik_request_duration_seconds_bucket{service="backend-api-rt",le="0.1"} 7.8699695e+07
traefik_request_duration_seconds_bucket{service="backend-api-rt",le="0.3"} 7.8699746e+07
traefik_request_duration_seconds_bucket{service="backend-api-rt",le="1.2"} 7.8699747e+07
traefik_request_duration_seconds_bucket{service="backend-api-rt",le="5"} 7.8699747e+07
traefik_request_duration_seconds_bucket{service="backend-api-rt",le="+Inf"} 7.8699747e+07
traefik_request_duration_seconds_sum{service="backend-api-rt"} 86249.10955907247
traefik_request_duration_seconds_count{service="backend-api-rt"} 7.8699747e+07
traefik_request_duration_seconds_bucket{service="backend-mon-grafana",le="0.1"} 1773
traefik_request_duration_seconds_bucket{service="backend-mon-grafana",le="0.3"} 2066
traefik_request_duration_seconds_bucket{service="backend-mon-grafana",le="1.2"} 2470
traefik_request_duration_seconds_bucket{service="backend-mon-grafana",le="5"} 2549
traefik_request_duration_seconds_bucket{service="backend-mon-grafana",le="+Inf"} 2590
traefik_request_duration_seconds_sum{service="backend-mon-grafana"} 1190.2373577959986
traefik_request_duration_seconds_count{service="backend-mon-grafana"} 2590
traefik_request_duration_seconds_bucket{service="backend-mon-portainer",le="0.1"} 14
traefik_request_duration_seconds_bucket{service="backend-mon-portainer",le="0.3"} 16
traefik_request_duration_seconds_bucket{service="backend-mon-portainer",le="1.2"} 16
traefik_request_duration_seconds_bucket{service="backend-mon-portainer",le="5"} 16
traefik_request_duration_seconds_bucket{service="backend-mon-portainer",le="+Inf"} 16
traefik_request_duration_seconds_sum{service="backend-mon-portainer"} 0.43985680700000007
traefik_request_duration_seconds_count{service="backend-mon-portainer"} 16
traefik_request_duration_seconds_bucket{service="backend-mon-prometheus",le="0.1"} 18545
traefik_request_duration_seconds_bucket{service="backend-mon-prometheus",le="0.3"} 18723
traefik_request_duration_seconds_bucket{service="backend-mon-prometheus",le="1.2"} 19043
traefik_request_duration_seconds_bucket{service="backend-mon-prometheus",le="5"} 19075
traefik_request_duration_seconds_bucket{service="backend-mon-prometheus",le="+Inf"} 19080
traefik_request_duration_seconds_sum{service="backend-mon-prometheus"} 407.9695189710023
traefik_request_duration_seconds_count{service="backend-mon-prometheus"} 19080
traefik_request_duration_seconds_bucket{service="backend-testsrvc",le="0.1"} 9
traefik_request_duration_seconds_bucket{service="backend-testsrvc",le="0.3"} 9
traefik_request_duration_seconds_bucket{service="backend-testsrvc",le="1.2"} 9
traefik_request_duration_seconds_bucket{service="backend-testsrvc",le="5"} 9
traefik_request_duration_seconds_bucket{service="backend-testsrvc",le="+Inf"} 9
traefik_request_duration_seconds_sum{service="backend-testsrvc"} 0.002143812
traefik_request_duration_seconds_count{service="backend-testsrvc"} 9
traefik_request_duration_seconds_bucket{service="http",le="0.1"} 1.29772167e+08
traefik_request_duration_seconds_bucket{service="http",le="0.3"} 1.2984684e+08
traefik_request_duration_seconds_bucket{service="http",le="1.2"} 1.29875175e+08
traefik_request_duration_seconds_bucket{service="http",le="5"} 1.29875606e+08
traefik_request_duration_seconds_bucket{service="http",le="+Inf"} 1.29875714e+08
traefik_request_duration_seconds_sum{service="http"} 214590.6596774284
traefik_request_duration_seconds_count{service="http"} 1.29875714e+08
traefik_request_duration_seconds_bucket{service="https",le="0.1"} 1.56316116e+08
traefik_request_duration_seconds_bucket{service="https",le="0.3"} 1.56331186e+08
traefik_request_duration_seconds_bucket{service="https",le="1.2"} 1.56352413e+08
traefik_request_duration_seconds_bucket{service="https",le="5"} 1.56382841e+08
traefik_request_duration_seconds_bucket{service="https",le="+Inf"} 1.56469056e+08
traefik_request_duration_seconds_sum{service="https"} 1.2247939369712947e+06
traefik_request_duration_seconds_count{service="https"} 1.56469056e+08
# HELP traefik_requests_total How many HTTP requests processed, partitioned by status code and method.
# TYPE traefik_requests_total counter
traefik_requests_total{code="200",method="DELETE",service="backend-mon-grafana"} 1
traefik_requests_total{code="200",method="DELETE",service="http"} 1
traefik_requests_total{code="200",method="GET",service="backend-api-bulk"} 1.065953e+06
traefik_requests_total{code="200",method="GET",service="backend-api-crystal-ball"} 47
traefik_requests_total{code="200",method="GET",service="backend-api-mobile"} 2.00397947e+08
traefik_requests_total{code="200",method="GET",service="backend-api-peephole"} 4
traefik_requests_total{code="200",method="GET",service="backend-api-rt"} 7.8147405e+07
traefik_requests_total{code="200",method="GET",service="backend-mon-grafana"} 2251
traefik_requests_total{code="200",method="GET",service="backend-mon-portainer"} 14
traefik_requests_total{code="200",method="GET",service="backend-mon-prometheus"} 18730
traefik_requests_total{code="200",method="GET",service="http"} 1.28645785e+08
traefik_requests_total{code="200",method="GET",service="https"} 1.50986566e+08
traefik_requests_total{code="200",method="OPTIONS",service="backend-api-rt"} 8
traefik_requests_total{code="200",method="OPTIONS",service="http"} 8
traefik_requests_total{code="200",method="PATCH",service="backend-mon-grafana"} 2
traefik_requests_total{code="200",method="PATCH",service="http"} 2
traefik_requests_total{code="200",method="POST",service="backend-api-bulk"} 5.740075e+06
traefik_requests_total{code="200",method="POST",service="backend-mon-grafana"} 150
traefik_requests_total{code="200",method="POST",service="backend-mon-portainer"} 2
traefik_requests_total{code="200",method="POST",service="http"} 468108
traefik_requests_total{code="200",method="POST",service="https"} 5.272119e+06
traefik_requests_total{code="200",method="PUT",service="backend-mon-grafana"} 3
traefik_requests_total{code="200",method="PUT",service="http"} 3
traefik_requests_total{code="301",method="GET",service="backend-api-crystal-ball"} 1
traefik_requests_total{code="301",method="GET",service="http"} 4
traefik_requests_total{code="301",method="GET",service="https"} 3
traefik_requests_total{code="301",method="POST",service="http"} 1
traefik_requests_total{code="302",method="GET",service="backend-mon-grafana"} 10
traefik_requests_total{code="302",method="GET",service="http"} 10
traefik_requests_total{code="304",method="GET",service="backend-mon-grafana"} 135
traefik_requests_total{code="304",method="GET",service="http"} 135
traefik_requests_total{code="400",method="GET",service="backend-mon-grafana"} 4
traefik_requests_total{code="400",method="GET",service="http"} 4
traefik_requests_total{code="400",method="POST",service="backend-api-bulk"} 63
traefik_requests_total{code="400",method="POST",service="https"} 63
traefik_requests_total{code="401",method="GET",service="backend-api-crystal-ball"} 11
traefik_requests_total{code="401",method="GET",service="backend-mon-grafana"} 8
traefik_requests_total{code="401",method="GET",service="http"} 19
traefik_requests_total{code="401",method="POST",service="backend-api-bulk"} 389554
traefik_requests_total{code="401",method="POST",service="backend-api-peephole"} 1
traefik_requests_total{code="401",method="POST",service="backend-mon-grafana"} 4
traefik_requests_total{code="401",method="POST",service="http"} 389517
traefik_requests_total{code="401",method="POST",service="https"} 42
traefik_requests_total{code="403",method="GET",service="backend-api-rt"} 495752
traefik_requests_total{code="403",method="GET",service="http"} 298665
traefik_requests_total{code="403",method="GET",service="https"} 197087
traefik_requests_total{code="404",method="CONNECT",service="http"} 4
traefik_requests_total{code="404",method="GET",service="backend-api-bulk"} 3
traefik_requests_total{code="404",method="GET",service="backend-mon-grafana"} 5
traefik_requests_total{code="404",method="GET",service="http"} 2999
traefik_requests_total{code="404",method="GET",service="https"} 2086
traefik_requests_total{code="404",method="HEAD",service="backend-api-mobile"} 1544
traefik_requests_total{code="404",method="HEAD",service="backend-api-rt"} 16432
traefik_requests_total{code="404",method="HEAD",service="http"} 27642
traefik_requ
######################################################################## 100.0%* Curl_http_done: called premature == 0
* Connection #0 to host tld.com left intact

ests_total{code="404",method="HEAD",service="https"} 1695
traefik_requests_total{code="404",method="OPTIONS",service="http"} 223
traefik_requests_total{code="404",method="OPTIONS",service="https"} 221
traefik_requests_total{code="404",method="POST",service="http"} 14
traefik_requests_total{code="404",method="POST",service="https"} 149
traefik_requests_total{code="404",method="UNKNOWN",service="backend-api-rt"} 1
traefik_requests_total{code="404",method="UNKNOWN",service="http"} 1
traefik_requests_total{code="405",method="GET",service="backend-api-peephole"} 5
traefik_requests_total{code="405",method="GET",service="http"} 5
traefik_requests_total{code="422",method="PUT",service="backend-mon-grafana"} 1
traefik_requests_total{code="422",method="PUT",service="http"} 1
traefik_requests_total{code="500",method="GET",service="backend-api-bulk"} 570
traefik_requests_total{code="500",method="GET",service="backend-api-crystal-ball"} 4
traefik_requests_total{code="500",method="GET",service="backend-api-mobile"} 10368
traefik_requests_total{code="500",method="GET",service="backend-api-rt"} 40146
traefik_requests_total{code="500",method="GET",service="backend-mon-grafana"} 14
traefik_requests_total{code="500",method="GET",service="backend-mon-prometheus"} 11
traefik_requests_total{code="500",method="GET",service="http"} 42167
traefik_requests_total{code="500",method="GET",service="https"} 8946
traefik_requests_total{code="500",method="HEAD",service="backend-api-rt"} 2
traefik_requests_total{code="500",method="HEAD",service="http"} 2
traefik_requests_total{code="500",method="POST",service="backend-api-bulk"} 25
traefik_requests_total{code="500",method="POST",service="backend-mon-grafana"} 1
traefik_requests_total{code="500",method="POST",service="http"} 6
traefik_requests_total{code="500",method="POST",service="https"} 20
traefik_requests_total{code="500",method="›Ã™Ó����rgda����������������������������������������˘������������������������������������������������������������������������������������������������������������������������������������������PY£üx������������������������������������������������������������������������������������������������������������������������com.apple.WebKit����GET",service="backend-api-rt"} 1
traefik_requests_total{code="500",method="›Ã™Ó����rgda����������������������������������������˘������������������������������������������������������������������������������������������������������������������������������������������PY£üx������������������������������������������������������������������������������������������������������������������������com.apple.WebKit����GET",service="http"} 1
traefik_requests_total{code="502",method="GET",service="backend-api-bulk"} 2
traefik_requests_total{code="502",method="GET",service="backend-mon-prometheus"} 339
traefik_requests_total{code="502",method="GET",service="backend-testsrvc"} 9
traefik_requests_total{code="502",method="GET",service="http"} 339
traefik_requests_total{code="502",method="GET",service="https"} 11
traefik_requests_total{code="502",method="POST",service="backend-api-bulk"} 1
traefik_requests_total{code="502",method="POST",service="backend-mon-grafana"} 1
traefik_requests_total{code="502",method="POST",service="http"} 1
traefik_requests_total{code="502",method="POST",service="https"} 1
traefik_requests_total{code="504",method="POST",service="backend-api-bulk"} 94
traefik_requests_total{code="504",method="POST",service="http"} 47
traefik_requests_total{code="504",method="POST",service="https"} 47

@ajardan
Copy link
Contributor Author

ajardan commented Jul 24, 2017

Just want to update and say this still happens from time to time. Had to restart the service today to fix metrics collection...

@timoreimann
Copy link
Contributor

/cc @marco-jantke

@ldez
Copy link
Member

ldez commented Jul 24, 2017

@ajardan we put "method" information only, I think you have a client who send wrong "method".

@ajardan
Copy link
Contributor Author

ajardan commented Jul 24, 2017

That could be the case. So I guess there will be no fix from traefik, and I should open a bug with Prometheus ?

Or maybe it makes sense to filter bogus methods, since those anyway have no effect and are blocked ?

@ldez
Copy link
Member

ldez commented Jul 24, 2017

I think it's more a Prometheus issue: sending all data without additional behavior seems normal for Traefik, Prometheus must filter and display these specific cases.

@ldez
Copy link
Member

ldez commented Jul 24, 2017

I'll close this issue, because I think the question is answered.

@ldez ldez closed this as completed Jul 24, 2017
@ajardan
Copy link
Contributor Author

ajardan commented Jul 24, 2017

I think this has to be re-opened, since according to Prometheus documentation (https://prometheus.io/docs/instrumenting/exposition_formats/), the text in /metrics page has to be UTF-8.

So traefik is violating the format, which seems like a bug in traefik?

@emilevauge

@gouthamve
Copy link

Also, expanding on this prometheus/prometheus#2983 (comment), handling of partial scrapes has a lot of issues that it won't be supported.

For example, if we have a histogram and we ingest the data only partially, then the queries will start giving weird results.

@ldez ldez reopened this Jul 24, 2017
@ldez ldez added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Jul 24, 2017
@ldez
Copy link
Member

ldez commented Jul 27, 2017

@ajardan I made a quick fix, but I need to explore other ways.
I will make PR quickly (I hope).

@ajardan
Copy link
Contributor Author

ajardan commented Jul 27, 2017

Whoa, that was fast, thanks a lot @ldez !

@m3co-code
Copy link
Contributor

@ldez did you move forward with this so far? I think the bug is actually "located" in the Prometheus client library for go and already documented as issue here: prometheus/client_golang#274

Personally I have the feeling that Traefik should not do any additional validation of inputs on that matter. Thats in the end what the client library is for.

@ajardan obviously you have some misbehaving client. Might it be possible for you to track them down and fix on there side?

@ajardan
Copy link
Contributor Author

ajardan commented Aug 10, 2017

@marco-jantke Unfortunately I don't. And even if I could do that, this is not a proper fix, since nothing stops another random guy sending requests like this and breaking monitoring.

I guess traefik was not designed as an intranet tool where we all have control over our clients, so this could be used by "bad guys".

And yes, fixing this in the Prometheus client library makes sense if it is used by traefik, since this will fix the root cause, not the consequences.

@m3co-code
Copy link
Contributor

I asked how an implementation in Prometheus client library should look like for label value validation. The maintainer of the project suggested that the lib should immediately error out on retrieval of invalid utf-8 data. This would mean in our concrete usage of it in Traefik that we have to validate all data that could be invalid (user input) before passing it to the lib. This should only be the method at the moment, I will create a PR for this.

@ldez
Copy link
Member

ldez commented Aug 17, 2017

@marco-jantke I have already do that. Please don't make a PR for that.

@m3co-code
Copy link
Contributor

Ok great. Let me know when you need someone to review :)

@ldez ldez self-assigned this Aug 29, 2017
@traefiker traefiker added this to the 1.4 milestone Sep 8, 2017
@traefiker
Copy link
Contributor

Closed by #2081.

@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants