New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more prometheus metrics #1405
Conversation
@jereksel you cannot change vendored code directly; you have to make a PR against upstream and afterwards revendor the updated library in Traefik (i.e., use glide to bump our referenced version). For the time being though, I think it's okay to pretend this is the oxy change we want, do the review, and eventually update containous/oxy (our fork of vulcand/oxy) should we all agree to the suggested path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to have more metrics, appreciated!
Left a few comments. One thing I'm not clear on is whether the oxy change can be implemented differently (i.e., without touching the oxy package). I'm just not too deep into that part of the code.
@containous/traefik any thoughts on this one?
@@ -19,13 +24,41 @@ type Metrics interface { | |||
// given Metrics implementation to expose and monitor Traefik metrics | |||
type MetricsWrapper struct { | |||
Impl Metrics | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the blank line.
server.go
Outdated
server.backendReloadCounter = prometheus.NewCounterFrom( | ||
stdprometheus.CounterOpts{ | ||
Name: "traefik_backend_reload_total", | ||
Help: "How many backend requests, partitioned by state and type.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description seems wrong?
func NewMetricsWrapper(impl Metrics) *MetricsWrapper { | ||
func NewBackendMetricsWrapper(impl Metrics) *MetricsWrapper { | ||
|
||
var f = func(r *http.Request) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do f :=
here?
@@ -19,13 +24,41 @@ type Metrics interface { | |||
// given Metrics implementation to expose and monitor Traefik metrics | |||
type MetricsWrapper struct { | |||
Impl Metrics | |||
|
|||
//frontend/backend | |||
typ string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the fact that typ
isn't proper English, I also find it too generic. We should try to come up with a better name.
Maybe "phase"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about naming it type
, but it's reserved keyword. Phase sound good thought.
|
||
//frontend/backend | ||
typ string | ||
getName fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more specific. What kind of name? getServerName
maybe?
"net/http" | ||
) | ||
|
||
//Transformer is a hacky way to get metrics middleware with frontend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between leading double dashes and first word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get why we need the transformer. Could you explain please?
return &Transformer{next, metrics} | ||
} | ||
|
||
func (sb *Transformer) ServeHTTP(rw http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the name sb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from another file 😄
server.go
Outdated
@@ -88,6 +92,13 @@ func NewServer(globalConfiguration GlobalConfiguration) *Server { | |||
// leadership creation if cluster mode | |||
server.leadership = cluster.NewLeadership(server.routinesPool.Ctx(), globalConfiguration.Cluster) | |||
} | |||
server.backendReloadCounter = prometheus.NewCounterFrom( | |||
stdprometheus.CounterOpts{ | |||
Name: "traefik_backend_reload_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say "reloads" (plural) according to the Prometheus naming guidelines.
server.go
Outdated
@@ -219,17 +230,22 @@ func (server *Server) listenProviders(stop chan bool) { | |||
return | |||
case configMsg, ok := <-server.configurationChan: | |||
if !ok { | |||
server.backendReloadCounter.With("state", "failed", "type", configMsg.ProviderName).Add(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case where the channel gets closed, which (AFAIU) could be for legitimate reasons. Hence, we should not count this as a failed reload event.
server.go
Outdated
server.defaultConfigurationValues(configMsg.Configuration) | ||
currentConfigurations := server.currentConfigurations.Get().(configs) | ||
jsonConf, _ := json.Marshal(configMsg.Configuration) | ||
log.Debugf("Configuration received from provider %s: %s", configMsg.ProviderName, string(jsonConf)) | ||
if configMsg.Configuration == nil || configMsg.Configuration.Backends == nil && configMsg.Configuration.Frontends == nil { | ||
log.Infof("Skipping empty Configuration for provider %s", configMsg.ProviderName) | ||
labels = []string{"state", "failed", "type", configMsg.ProviderName} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only parameter that changes between the different occasions where we assign the slice is the state. Can we initialize the slice with the other parameters in line 236 right away and just complete the state in the subsequent lines, respectively?
@@ -19,13 +24,41 @@ type Metrics interface { | |||
// given Metrics implementation to expose and monitor Traefik metrics | |||
type MetricsWrapper struct { | |||
Impl Metrics | |||
|
|||
//frontend/backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should maybe be a bit more elaborate.
Also, space between slash and first word.
@jereksel do you the time to update this PR before the upcoming feature freeze for the 1.3 release? |
Sure, I'll try to fix this MR today. |
@idez @timoreimann Maybe I'll close this and redo it with #1408 and #1485 |
@jereksel simply rebase your branch, like that you'll already have the content of the #1408 |
Yep, rebase is the way to go. We have also restructured/moved some packages. (For instance, |
I'll wait for #1485 - I think I have an idea how to combine it with Prometheus metrics. |
This MR implement more metrics mentioned here.
It implements more detailed backend metrics (it now includes backend server), frontend metrics (see below) and config reload metrics.
Unfortunatelly there doesn't seem to be a clean way to get more information about frontend without rewriting route management.
I'm also not sure about frontend/backend times. I've included debug start/stop messages for backend and frontend and these are the results:
Which is very weird because backend is invoked before frontend and I don't know how to add time metrics that will correctly calculate frontend and backend response times.