Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for backend_retries_total #1504

Merged
merged 1 commit into from Jun 7, 2017

Conversation

mjeri
Copy link
Contributor

@mjeri mjeri commented Apr 27, 2017

This PR adds metrics for the backend_retries_total. On the way to implement this, I did a couple of clean up tasks and improved the testing coverage of the existing implementation.

Cleanup tasks:

  • I renamed variables/methods for the request duration metrics, from latency to requestDuration. The concept was confused before in the code and in fact the request duration is tracked and the exported prometheus name was traefik_request_duration_seconds all the time. Therefore this is no breaking change, just clarification inside the code base.
  • I remove the Handler() method from the Metrics interface as this is a concern not relevant for consumers of the interface, which are concerned about the collection of metrics. Also the method was not even used.
  • I made the ResponseRecorder in the retry middleware private, as this is something you would not expect from the package interface to provide.

Apart of this I added unit tests for the retry.go and the parts I touched in the server.go. Especially the server.go method is hard to test, due to its current design, but I hope it is a first step and having the file will motivate others to increase the coverage there as well.

@mjeri
Copy link
Contributor Author

mjeri commented Apr 27, 2017

I will fix the build tomorrow morning, please don't hesitate to have a look at the PR already. That happens when you only execute make test-unit and make test-integration locally 🐼

@ldez ldez added area/api kind/enhancement a new or improved feature. labels Apr 27, 2017
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I work with Marco, we have reviewed the PR internally over the last few days. Hence, I think it makes sense if I give my LGTM. :-)

@ldez
Copy link
Member

ldez commented Apr 27, 2017

Welcome and thanks for your contribution!
Can you fix the "formatting bug" ? https://travis-ci.org/containous/traefik/jobs/226464395#L826

@mjeri
Copy link
Contributor Author

mjeri commented Apr 28, 2017

Thanks for the welcome and for the introduction @timoreimann. I should probably also have said a couple of words :D I am happy to be able to contribute to the project. Linting errors are fixed by the way.

@mjeri
Copy link
Contributor Author

mjeri commented May 2, 2017

Hi @vdemeester and @emilevauge. Can you have a look at this PR? @timoreimann told me that one of you has to give the approve the PR in order to get it merged :)

@mjeri
Copy link
Contributor Author

mjeri commented May 12, 2017

I just updated PR against latest master again. @vdemeester or @emilevauge can you have a look?

@ldez ldez added this to the 1.4 milestone May 12, 2017
Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marco-jantke, thanks a lot for your contribution!
I made few comments, I also what has been done in server.go could be improved a bit :)

if err != nil {
e, ok := err.(stdprometheus.AlreadyRegisteredError)
if !ok {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use panic, return an error instead. You shouldn't stop a reverse proxy because of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I rebuilt it now in a way that if any of the Prometheus Metrics can not be registered, an error will be returned and the *Prometheus will be nil. This will lead to no Metrics instrumentation and an log message on Error level. Does it make sense to you this way?

}

// NewRetry returns a new Retry instance
func NewRetry(attempts int, next http.Handler) *Retry {
func NewRetry(attempts int, next http.Handler, metrics RetryMetrics) *Retry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add an interface here, instead of passing RetryMetrics directly. WDYT ?

type Listener interface {
	Retried(attempt int) error
}

Copy link
Contributor Author

@mjeri mjeri May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I refrain from building abstractions before they are actually required. Therefore personally I would rather leave this part as is, extending it with a more generic implementation once we have an use-case for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge WDYT? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry middleware should be loosely coupled to the metrics middleware. If I agree with you on not building abstractions too early, here we rather need isolation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see your point and I am fine with implementing the abstraction layer. Though I would like to make sure that we are on the same page with our understanding of the code.

The Retry middleware is not coupled to the metrics middleware. It only knows about the RetryMetrics interface, which happens to be right now in the middleware package (IMHO this is the rather confusing part). The Metrics and RetryMetrics interfaces are in my opinion already a layer of abstraction. The metrics middleware in turn also "only" depends on the Metrics interface.

Though the abstraction you suggest would remove all knowledge about metrics from the retry middleware. Do I understand your intention correctly? If so and you suggest that way, I will adapt the PR accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the abstraction you suggest would remove all knowledge about metrics from the retry middleware. Do I understand your intention correctly?

Exactly :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the feedback. I implemented it now. See 7ff5692146cc1e0f8a804a1ef54e59cfeb4e4fb1

server/server.go Outdated
@@ -180,7 +180,8 @@ func (server *Server) startHTTPServers() {
serverMiddlewares := []negroni.Handler{server.accessLoggerMiddleware, server.loggerMiddleware, metrics}
if server.globalConfiguration.Web != nil && server.globalConfiguration.Web.Metrics != nil {
if server.globalConfiguration.Web.Metrics.Prometheus != nil {
metricsMiddleware := middlewares.NewMetricsWrapper(middlewares.NewPrometheus(newServerEntryPointName, server.globalConfiguration.Web.Metrics.Prometheus))
promMetrics, _ := middlewares.NewPrometheus(newServerEntryPointName, server.globalConfiguration.Web.Metrics.Prometheus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you use newMetrics here too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed it.

@mjeri mjeri force-pushed the add-retry-metrics branch 2 times, most recently from 716162c to 5d9e9ba Compare May 16, 2017 06:43
@mjeri
Copy link
Contributor Author

mjeri commented May 16, 2017

Closing/Reopening to get SemaphoreCI working.

@mjeri
Copy link
Contributor Author

mjeri commented May 19, 2017

@emilevauge can you have a look again? :)

@ldez
Copy link
Member

ldez commented May 25, 2017

@marco-jantke could you resolve the conflicts?

@mjeri
Copy link
Contributor Author

mjeri commented May 30, 2017

Ok, everything should be ready now.

@mjeri
Copy link
Contributor Author

mjeri commented Jun 6, 2017

@emilevauge ping

return nil
}

func registerRetryMiddleware(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-jantke could you remove this new function? It is called only one time, has too few lines and is then hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically introduced to enable testing, that the middleware is constructed properly. I think this would not be possible anymore, when I inline the logic back into the loadConfig. So I can remove it of course if you prefer it this way, but I guess then the test also has to die.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I was not thinking about this ;) It's all good then!

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @marco-jantke
LGTM

@timoreimann timoreimann merged commit e007bb7 into traefik:master Jun 7, 2017
@mjeri
Copy link
Contributor Author

mjeri commented Jun 7, 2017

Thanks for the reviews and merging 🎉

@mjeri mjeri deleted the add-retry-metrics branch June 7, 2017 07:22
@ldez ldez changed the title add metrics for backend_retries_total Add metrics for backend_retries_total Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants