Skip to content

Conversation

@timoreimann
Copy link
Contributor

@timoreimann timoreimann commented Jul 20, 2017

This change adds support for Marathon readiness checks: If the option is set on Traefik and applications are found to come with configured readiness checks, the check results (as conducted by Marathon and exposed through its API) will be taken into account at deployment times
in order to decide if a task should be taken into load-balancing rotation or not.

Note that I had to extend the event filter that controls how often the Marathon provider polls the API so that the readiness checker is able to pick up all necessary readiness state transitions.

Fixes #1185, #1559

Based on #1871 for now; will be rebased onto master once the dependent branch gets merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: Ideally, I'd like to support logging in "trace level" more generally (i.e., without having to pass the trace flag into a particular function that needs this information).

I'm open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename logf to tracef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, much better. Done.

Copy link
Contributor Author

@timoreimann timoreimann Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: The readiness checker logic should be fairly clear except for this corner case (hence the lengthy comment). Internally, we went through several iterations to improve the code and iron out corner cases (though with Bamboo instead of Traefik, the logic is identical -- I basically copy & pasted the implementation). It's been running in our production system for several weeks now, so it should have gained a fair level of robustness.

Nevertheless, I'd happy for suggestions to make it simpler/better.

@timoreimann
Copy link
Contributor Author

/cc @jangie @stibbons

@timoreimann timoreimann force-pushed the marathon-readiness-checks branch from da238f9 to 18e7e69 Compare July 20, 2017 19:27
@ldez ldez force-pushed the marathon-test-builder branch from 6dcc278 to 4546613 Compare July 21, 2017 14:57
@timoreimann timoreimann changed the base branch from marathon-test-builder to master July 21, 2017 19:50
@timoreimann timoreimann force-pushed the marathon-readiness-checks branch from 18e7e69 to 50489fd Compare July 21, 2017 19:50
@timoreimann
Copy link
Contributor Author

@ldez commit 50489fd removes the prefixes again. Do I understand correctly that this is how we want things now? Please let me know.

@emilevauge
Copy link
Member

@timoreimann wow, impressive job 👏
I'm not a huge fan of the naming respectReadinessChecks ;) But I can live with that, even if I tend to prefer shorter names for options (I can see the Java lover behind this ^^).
This PR is quite big&risky and it would be even better if we could have some integration tests on this (#1406). I know you are waiting for docker-archive-public/docker.libcompose#484 to complete: @vdemeester we really need your help on this one :)
LGTM on design

@timoreimann
Copy link
Contributor Author

The Javaness in me is less of a love and more of a bad, hard-to-shake-off habit. 😬

I'm very open to shortening the option name. respectReadiness? filterReady[Tasks]? onlyReady[Tasks]? readyOnly? Any suggestions?

While an integration test would be nice, I can say from experience with this feature that covering all cases will be difficult: It depends a lot on how applications/tasks behave and the timing involved. It took us two or three internal patches to get this right with Bamboo. On the plus side, this means the algorithm is battle-tested to a certain degree already. And users can turn on/off readiness check filtering on demand.

That said, if someone can help out on the upstream libcompose blocker, I'd be more than happy to contribute an integration test. 👼

@timoreimann timoreimann force-pushed the marathon-readiness-checks branch from 50489fd to beafbc7 Compare July 24, 2017 19:03
@ldez ldez added this to the 1.4 milestone Jul 29, 2017
Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Do you know if the problem of "bad readyness check result" will be fixed in marathon, because this add a lots of useless and complicated code in traefik just in order to handle this "marathon bug".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename logf to tracef

},
}

for _, c := range cases {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer test or case instead of c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use case because that's a reserved keyword in Go, so I renamed everything to test.

@timoreimann
Copy link
Contributor Author

@juliens:

Do you know if the problem of "bad readyness check result" will be fixed in marathon, because this add a lots of useless and complicated code in traefik just in order to handle this "marathon bug".

Last time I checked I haven't seen a bug report on the matter. I'll double-check, file a report if I find nothing, and post it here.

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ldez ldez added the size/L label Aug 16, 2017
Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏 👏

This change adds support for Marathon readiness checks: If the option is
set on Traefik and applications are found to come with configured
readiness checks, the check results (as conducted by Marathon and
exposed through its API) will be taken into account at deployment times
in order to decide if a task should be taken into load-balancing
rotation or not.

Note that I had to extend the event filter that controls how often the
Marathon provider polls the API so that the readiness checker is able to
pick up all necessary readiness state transitions.
@traefiker traefiker force-pushed the marathon-readiness-checks branch from eb7c72f to 7fb888a Compare August 18, 2017 00:42
@traefiker traefiker merged commit ea3510d into master Aug 18, 2017
@ldez ldez deleted the marathon-readiness-checks branch August 18, 2017 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants