New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor into dual Rancher API/Metadata providers #1563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good morning @martinbaillie ,
first of all: thanks a LOT for your PR 😍
I would personally vote for putting metadata as the default client and let the user actively decide to switch to API if his use case needs it. I can think of some cases which would need API access, but again, the user would actively have to choose that.
I like the seperation of file level btw. It makes stuff more clear.
Apart from that, I can not find something that I don't like. Good job here!
So please, put metadata as the default in place :)
d6008e8
to
5f5be5f
Compare
@SantoDE agreed! Implemented metadata service as default 👍 Previous API users will simply need I also worked through those Glide issues (Golang's |
Due to SemaphoreCI, I close and reopen the PR. |
@martinbaillie could you pick the #1414 content to your PR? And some tests on this part? |
@ldez - sure, done. As for tests, unfortunately I struggled to find anything new to test. This refactor either re-uses/modifies existing Traefik funcs (and I updated those respective tests) or otherwise makes use of the official Rancher metadata client which would really need tested in the integration test suite for Rancher (currently missing, I believe there’s a separate issue to address that as a whole). |
@martinbaillie thanks for incorporate my change in #1414. Can you test the scenario mentioned in #1414? I suspect it won't work anyway because the metadata returned is for that environment only. But no harm testing I guess. |
@bseng all good, the introduction of metadata service based provider does not use the functions you modified. The functions have simply been renamed and moved to an |
G'day folks. Wondering if we're waiting on anything else from me here? Or is it just scheduling? The PR has gotten a little out of date. I'm happy to sync it up again if it'll get merged shortly after. |
ping @SantoDE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @martinbaillie,
thanks a lot! :) This looks (apart from the minor documentation thingy) good :)
LGTM 👼
traefik.sample.toml
Outdated
# | ||
# Required | ||
# Required (unless EnableMetadataService = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need that unless, if it's default? If yes, we need to show EnableMetadataService in the sample as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops. This was a remnant from the previous iteration when metadata service was not the default.
The config is actually EnableAPI
. I've reworded the sample to reflect this:
# Required (if EnableAPI = true)
...
Hey @SantoDE, Fixed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! 👍
LGTM
@martinbaillie could you squash your commits? |
@ldez done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martinbaillie :)
We think we need backwards compatibility here.
Could you default to API instead (renaming EnableAPI
to MetadataMode
) ?
Could you also add
type Provider struct {
APIConfiguration `mapstructure:",squash"`
// ...
}
and initialize provider.api.*
if len(provider.api.accesskey) > 0
?
Thanks for you great work on this 👏
@emilevauge no worries. Please see updated code. I've reworked it back to having API as default and stay backwards compatible. Also updated the config naming where it made sense. A help run looks like this now, with clearly distinguished metadata service settings:
Similarly the sample TOML reflects this. |
Wow @martinbaillie 😍 |
@martinbaillie Sorry we misunderstood ;). We still think having both sub-structs for API/metadata service is great. But we also think it's possible to stay backward compatible adding a (duplicate) nested API struct to Provider. type Provider struct {
APIConfiguration `mapstructure:",squash"`
api *APIConfiguration `description:"API configuration"`
metadata *MetadataServiceConfiguration `description:"MetaData configuration"`
// ...
} Besides, you would need to initialize I even think WDYT? |
@emilevauge ah now I get you. I can make those changes tomorrow 👍 I think we could ditch the need for |
Hi @emilevauge, @ldez I've updated the code again. Backwards compatibility is ensured and a WARN deprecation message printed. I've also ditched any "mode" concept. So yeh, the only small bugbear is the generated help output, but I don't think I can affect that?
|
I'm really happy about this PR <3 because we have just started integrating traefik with rancher in our development/experimental environments :). But one thing bothers me with current integration - when I shut down the rancher servers it seems traefik looses all information about existing services - it spits out stderror notice (Cannot get Provider Services) and wipes current configuration and by looking at the changes - this is still the case, right? This is less than ideal because suddenly downtime in rancher servers causes major downtime and I have to bother with rancher server HA setup. Shouldn't traefik keep last known service configuration after connection to server is lost? (maybe it's doable right now and I'm missing something so please correct me if I'm wrong!) update: ok, found #1703 (better late than never), seems it's known issue; but am I wrong or should we should return last known services as a fallback argument in case of error to update2: probably wrong PR to discuss this issue, sorry if I took it too far :) |
Hi @emq It's true that #1703 is covering this but worth mentioning here, so thanks 👍 I believe this bug only affects the API provider. This PR was mainly to introduce the Rancher metadata service as an alternate Rancher provider for Traefik, and in doing so simply refactored the API provider code into its own segregated area, but it has been otherwise left untouched. So #1703 should tackle the bug. The metadata service is unaffected by Rancher server outage (another benefit of this provider). Metadata runs as a global micro service, one per host. So if they all go down, I think you've got bigger things to worry about than losing ingress routing ;) EDIT: I should've mentioned I tested the metadata service infrastructure stack going down briefly. Traefik held its config 👍 But yeh, as above, lots of other things in the Rancher env start failing so you'll have a bad time regardless! |
You're absolutely right, totally forgot about that! Thanks for quick reply, I can't wait to battle-test this :) |
c8aed2e
to
2c1bdbd
Compare
@martinbaillie I have made some changes (related to discussions with the team). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinbaillie Thanks you very much for your hard work 👏 !
@ldez you rock ❤️
LGTM
Introduces Rancher's metadata service as an optional provider source for Traefik, enabled by setting `rancher.MetadataService`. The provider uses a long polling technique to watch the metadata service and obtain near instantaneous updates. Alternatively it can be configured to poll the metadata service every `rancher.RefreshSeconds` by setting `rancher.MetadataPoll`. The refactor splits API and metadata service code into separate source files respectively, and specific configuration is deferred to sub-structs. Incorporates bugfix traefik#1414
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A long running but a happy end! 😃
🎉 🎉 🎉
Thanks guys! 🎉 🎉 🎉 |
I've been able to do lots of zero downtime deployments with Rancher, Traefik and the metadata service provider 👍 Caveat: if your service has a scale of Scale of (Not sure if worth mentioning that bug in the release notes? @ldez @emilevauge) |
@martinbaillie, with how Rancher works, even with a scale greater than The only way I know to do a proper blue-green deploy using only Rancher + Traefik is to spin up a new stack which then overrides any host rules for the previous stack once the service health checks are passing. |
@kelchm I didn't see that with Rancher v1.6.2 when swapping in a new microservice version sitting behind Traefik+metadata provider. Microservice has a YMMV depending on how 'web scale' you are :) Though a quick run through
I suppose it'll depend on the type of backend service too. When it's marked healthy it really needs to be ready to handle traffic immediately, which is often hard with e.g. the JVM. Also if Traefik is sitting behind something else, maybe Rancher's HAProxy LB, then you will see a few dropped packets. I'm going direct to Traefik in this test, which is not what my production looks like at the moment. |
Using the API as the only source of Rancher Traefik configuration alienates some Rancher deployments, particularly in secure environments. This PR looks to resolve #1402 by allowing the user to choose between API and internal metadata service for sourcing Traefik configuration.
Some benefits of the metadata service:
RefreshSeconds
)/version
endpoint)Some PR notes/discussion points: