Only listen to configured k8s namespaces. #1895

timoreimann · 2017-07-25T15:55:44Z

Watch configured namespaces only.

We previously watched all namespaces, which was causing issues for users that want to apply restrictive RBAC rules. Hence, we change the logic to hook up one informer per namespace and kind we observe.

Note that this moves the namespace filtering into the client/informers (where it properly belongs). A nice side effect of this is that we stop receiving tons of events from unrelated namespaces (most notably kube-system), which has been a source of confusion for users in the past. We also don't need the extra event notifications as "retry signals" in case of ephemeral API failures anymore since we can now safely rely on a fairly low sync period for recovery purposes. This is preferable to getting hammered with event updates at sub-second intervals, only to have Traefik's back-off sleep of 2 seconds kicking in very often.

A few additional improvements that made it in:

Wait for the store synchronization to happen during WatchAll instead of when we process events. The wait needs to happen exactly once.
Do not close the event channel until all controllers have stopped processing. This was a pending race condition that could yield to crash.
Remove one superfluous event/watch channel (name changes depending on which method you're looking at).
Remove the namespace-related unit tests. (Must be replaced by to-be-written integration tests.)
Group methods logically.
Improve GoDocs.

Finally, we stop logging complete events as we receive them to reduce noise and stop leaking sensitive secrets data into the logs.

Fixes #1626.

timoreimann · 2017-07-25T15:57:02Z

This is still WIP, mostly because a proper integration test is yet missing (#1889). Nevertheless, I'd be happy to hear some early feedback (from everyone in general and in particular from @containous/kubernetes).

timoreimann · 2017-08-01T13:03:39Z

It looks like getting Kubernetes integration tests up and running will still require a bit of effort. Nevertheless, merging the PR is advisable IMO as Traefik can be operated in a much more secure way on Kubernetes with namespace-specific watches. Plus, we stop spamming the logs with tons of unrelated events.

I'll remove the WIP label for now and suggest we test this manually like we did with the previous changes.

timoreimann · 2017-08-16T12:04:27Z

Note to myself: The PR currently sets the default namespace by default. Depending on how #1961 resolves, I may need to change that to all namespaces.

emilevauge · 2017-08-17T20:25:37Z

Wow @timoreimann!
Design LGTM
As you completely refactored this provider, I would strongly recommend to make some benchmarks and profiling to ensure no memory leaks has been introduced :)

errm · 2017-08-18T16:55:27Z

@timoreimann this is on my radar and I will have a through review done by early next week..

RE the all namespaces thing, my setup currently depends on on this so we should certainly keep it as the default IMO

timoreimann · 2017-08-18T23:34:26Z

@errm I have pushed a commit that makes NamespaceAll the default (which should be in line with the now-merged #1961).

errm

Looks very nice and clean to me...

Subject to @emilevauge 's concerns about benchmarking / soak testing LGTM

timoreimann · 2017-08-21T08:58:08Z

Yeah I'll definitely try to work out some kind of benchmark.

Probably not going to happen before the 1.4 freeze, but not critical either.

dtomcej

This looks super clean and boss.

LGTM

aledbf · 2017-08-25T19:28:01Z

provider/kubernetes/client.go

@@ -19,31 +20,55 @@ import (
 	"k8s.io/client-go/tools/cache"
 )

-const resyncPeriod = time.Minute * 5
+const resyncPeriod = 10 * time.Second


@timoreimann this is too low, it should be like each 10 minutes. Just in case with this you are forcing a full sync (from api server) each 10 seconds.

AFAIU, there will be syncs only for the restricted items each informer is watching. Or am I missing something?

I'm okay with increasing the period in general. Syncing every 10 minutes though sounds like we'd not be catching up on missed Ingresses, Services, and Endpoints for a fairly long time. I wonder if that would still be acceptable?

full resync != delta updates (like Ingresses, Services, and Endpoints )

Edit: please check the gce ingress controller
https://github.com/kubernetes/kubernetes/blob/master/cluster/saltbase/salt/l7-gcp/glbc.manifest#L47

AFAIK, the glbc watches all namespaces. With this PR, Traefik will be watching selective namespaces only. What's being pulled in by a full sync should be subject to the list part of the ListWatch interface. So to my understanding, the sync volume should be much more restricted.

Please let me know if my assumptions are wrong somehow.

So to my understanding, the sync volume should be much more restricted.

Yes, that's correct but you are assuming the namespace only contains a few objects (ingress,services,secrets, etc). What happens if the user have hundred of objects in the namespace/s?

I think it's a trade-off between not overloading the cluster and being able to compensate for ephemeral errors in time. Maybe there's no one size fits all, and it'd be best to make the parameter configurable.

I agree though that we can probably bump the interval somewhat for now. Will go with a minute or so.

@aledbf you might be interested in this thread I started on the kubernetes-sig-api-machinery mailing list. Some take aways for me:

The resync period only causes the local cache content to get replayed; in particular, it does not hit the API server (that's what an explicit relist is for).

First and foremost, the resync period is a means to compensate for controller (not watch) bugs, though re-queuing techniques available in more recent informer versions take away a lot of that rationale. That said, resyncs may also be used for other cases. (Daniel describes how you can use them as a clock to initiate polls for external APIs periodically.)

It's a viable option to disable resyncs completely in order to surface controller bugs more quickly instead of risking to hide them. One example component that does this is the Kubernetes scheduler.

Hope this helps.

We previously watched all namespaces, which was causing issues for users that want to apply restrictive RBAC rules. Hence, we change the logic to hook up one informer per namespace and kind we observe. Note that this moves the namespace filtering into the client/informers (where it properly belongs). A nice side effect of this is that we stop receiving tons of events from unrelated namespaces (most notably kube-system), which has been a source of confusion for users in the past. We also don't need the extra event notifications as "retry signals" in case of ephemeral API failures anymore since we can now safely rely on a fairly low sync period for recovery purposes. This is preferable to getting hammered with event updates at sub-second intervals, only to have Traefik's back-off sleep of 2 seconds kicking in very often. A few additional improvements that made it in: - Wait for the store synchronization to happen during WatchAll instead of when we process events. The wait needs to happen exactly once. - Do not close the event channel until all controllers have stopped processing. This was a pending race condition that could yield to crash. - Remove one superfluous event/watch channel (name changes depending on which method you're looking at). - Remove the namespace-related unit tests. (Must be replaced by to-be-written integration tests.) - Group methods logically. - Improve GoDocs.

As a consequence, the lookup key is now dependent on whether we use NamespaceAll or more specific namespaces.

Shared informers are the standard informer types in more modern releases of client-go, so we can only benefit by using them now as well (even though we don't share any caches and presumably won't do so in the near future). Additionally, we replace the individual Watch* methods by a single WatchObjects method.

timoreimann · 2017-10-05T22:46:27Z

I finally came around running benchmarks. This is what I did:

Submitted 30 deployments into 3 namespaces with 10 deployments per namespace.
Added Ingresses and Services for each Deployment.
Started Traefik with --debug.
Fired off a script that continuously rescaled all Deployments to a replica count between 1 and 3.
Observe the memory consumption and goroutine count via expvarmon.

Here's what I observed:

The number of goroutines varied in a low range between ~110 and ~120. This is higher than the pre-PR count where I could measure ~45. However, this makes total sense given that we now run 4 informers per namespace versus 4 informers overall. The fact that watching a single namespace with the new implementation consumes about ~45 goroutines as well matches up nicely with the expected math.
The memory consumption varied between 5 and 10 MB. It would rise and fall even during my load tests, which shows that GC manages to reclaim memory successfully.
After stopping the rescaling script, memory consumption dropped as well after a short while. This shows we're not leaking memory.

I also verified that the number of backend servers matches the pod count.

Overall, this looks good to me. We surely want to gain more real world experience to solidify the new implementation. I do think though that the tests demonstrate basic correctness.

Happy to run more tests if desired.

timoreimann · 2017-10-05T23:03:56Z

@dtomcej @errm since you first reviewed, I pushed a few more commits (mostly just refactorings). Are you still okay with the status quo of this PR?

Asking bot not to merge until I get another approval confirmation.

timoreimann · 2017-10-07T22:05:41Z

Shortly going back to WIP as I realized RBAC documentation updates are yet missing...

timoreimann · 2017-10-07T22:22:51Z

...aaaaand it's done.

timoreimann · 2017-10-08T23:41:35Z

To all involved reviewers: Please indicate until Monday midnight CEST (i.e., in a good 22 hours from now) if your existing review should not hold anymore -- I'm happy to address any further comments.

Otherwise, I'll proceed with merging the PR.

emilevauge · 2017-10-09T09:09:30Z

@timoreimann I think this is a good example that WIP can be confusing some times :) Do we need to dismiss all reviews done before you removed the WIP label?

timoreimann · 2017-10-09T09:14:44Z

@emilevauge strictly speaking, I guess every new commit should invalidate all existing LGTMs.

Practically speaking, I'm not sure if we want to be that strict, especially since it'd take another three approvals.

For the PR at hand, the changes that happened after the LGTMs were received are rather simple / insignificant. So personally I feel confident moving forward even without another round of eye balls.

Either way, we may want to consider tuning the process to add clarity.

emilevauge · 2017-10-09T09:22:24Z

@timoreimann

strictly speaking, I guess every new commit should invalidate all existing LGTMs.

Yes, but here I'm talking about the fact that this PR was already reviewed by 3 maintainers, and still in WIP. Having commits following a review is normal, and easy to follow in Github (especially if the contributor produces well named commits). In this case, I'm not able to say if it's safe to merge or not.

Just saying we should really be cautious with WIP PRs in the future ;)

timoreimann added area/provider/k8s/ingress kind/enhancement a new or improved feature. WIP labels Jul 25, 2017

ldez requested a review from a team July 25, 2017 18:46

timoreimann added status/1-needs-design-review and removed WIP labels Aug 1, 2017

timoreimann force-pushed the k8s-only-listen-to-configured-namespaces branch 2 times, most recently from cd3445a to 4f60171 Compare August 4, 2017 08:09

ldez added size/L and removed size/L labels Aug 4, 2017

sbueringer mentioned this pull request Aug 16, 2017

[nginx] Support for multiple Namespaces kubernetes/ingress-nginx#732

Closed

ldez added the size/L label Aug 16, 2017

emilevauge added status/2-needs-review and removed status/1-needs-design-review labels Aug 17, 2017

timoreimann force-pushed the k8s-only-listen-to-configured-namespaces branch from 24172a9 to 98e6ba8 Compare August 18, 2017 23:34

errm approved these changes Aug 21, 2017

View reviewed changes

ldez modified the milestone: 1.5 Aug 22, 2017

sbueringer mentioned this pull request Aug 23, 2017

[nginx] Support for multiple namespaces in Kubernetes Clusters with RBAC kubernetes/ingress-nginx#1225

Closed

ldez requested a review from dtomcej August 24, 2017 08:11

dtomcej approved these changes Aug 25, 2017

View reviewed changes

ldez added this to the 1.5 milestone Aug 25, 2017

aledbf reviewed Aug 25, 2017

View reviewed changes

timoreimann added 5 commits October 5, 2017 01:13

[kubernetes] Log event type names only.

e293f95

Change default namespace to NamespaceAll ("").

7994272

As a consequence, the lookup key is now dependent on whether we use NamespaceAll or more specific namespaces.

Bump resync period to 1 minute.

d0ac8f5

timoreimann force-pushed the k8s-only-listen-to-configured-namespaces branch from 6c85c79 to d0ac8f5 Compare October 4, 2017 23:14

Increase sync period to 10 minutes.

81ccd89

timoreimann added bot/no-merge status/2-needs-review and removed WIP labels Oct 5, 2017

timoreimann added the WIP label Oct 7, 2017

Update documentation.

4bf41d5

timoreimann removed the WIP label Oct 7, 2017

timoreimann added status/3-needs-merge and removed bot/no-merge status/2-needs-review labels Oct 10, 2017

traefiker merged commit a5c644e into traefik:master Oct 10, 2017

traefiker removed the status/3-needs-merge label Oct 10, 2017

timoreimann deleted the k8s-only-listen-to-configured-namespaces branch October 10, 2017 14:27

jmara mentioned this pull request Nov 30, 2017

Add note to Kubernetes RBAC docs about RoleBindings and namespaces #2498

Merged

aledbf mentioned this pull request Apr 20, 2018

Advice around ingress secret cross-namespace kubernetes/ingress-nginx#2371

Closed

clook mentioned this pull request Jan 17, 2020

Better support multi-namespace & single-namespace deployments cert-manager/cert-manager#2525

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only listen to configured k8s namespaces. #1895

Only listen to configured k8s namespaces. #1895

timoreimann commented Jul 25, 2017

timoreimann commented Jul 25, 2017 •

edited

timoreimann commented Aug 1, 2017

timoreimann commented Aug 16, 2017

emilevauge commented Aug 17, 2017

errm commented Aug 18, 2017

timoreimann commented Aug 18, 2017

errm left a comment

timoreimann commented Aug 21, 2017

dtomcej left a comment

aledbf Aug 25, 2017

timoreimann Aug 25, 2017 •

edited

aledbf Aug 25, 2017 •

edited

timoreimann Aug 25, 2017

aledbf Aug 25, 2017

timoreimann Aug 26, 2017

timoreimann Sep 10, 2017

timoreimann commented Oct 5, 2017 •

edited

timoreimann commented Oct 5, 2017

timoreimann commented Oct 7, 2017 •

edited

timoreimann commented Oct 7, 2017 •

edited

timoreimann commented Oct 8, 2017 •

edited

emilevauge commented Oct 9, 2017 •

edited

timoreimann commented Oct 9, 2017

emilevauge commented Oct 9, 2017

Only listen to configured k8s namespaces. #1895

Only listen to configured k8s namespaces. #1895

Conversation

timoreimann commented Jul 25, 2017

timoreimann commented Jul 25, 2017 • edited

timoreimann commented Aug 1, 2017

timoreimann commented Aug 16, 2017

emilevauge commented Aug 17, 2017

errm commented Aug 18, 2017

timoreimann commented Aug 18, 2017

errm left a comment

Choose a reason for hiding this comment

timoreimann commented Aug 21, 2017

dtomcej left a comment

Choose a reason for hiding this comment

aledbf Aug 25, 2017

Choose a reason for hiding this comment

timoreimann Aug 25, 2017 • edited

Choose a reason for hiding this comment

aledbf Aug 25, 2017 • edited

Choose a reason for hiding this comment

timoreimann Aug 25, 2017

Choose a reason for hiding this comment

aledbf Aug 25, 2017

Choose a reason for hiding this comment

timoreimann Aug 26, 2017

Choose a reason for hiding this comment

timoreimann Sep 10, 2017

Choose a reason for hiding this comment

timoreimann commented Oct 5, 2017 • edited

timoreimann commented Oct 5, 2017

timoreimann commented Oct 7, 2017 • edited

timoreimann commented Oct 7, 2017 • edited

timoreimann commented Oct 8, 2017 • edited

emilevauge commented Oct 9, 2017 • edited

timoreimann commented Oct 9, 2017

emilevauge commented Oct 9, 2017

timoreimann commented Jul 25, 2017 •

edited

timoreimann Aug 25, 2017 •

edited

aledbf Aug 25, 2017 •

edited

timoreimann commented Oct 5, 2017 •

edited

timoreimann commented Oct 7, 2017 •

edited

timoreimann commented Oct 7, 2017 •

edited

timoreimann commented Oct 8, 2017 •

edited

emilevauge commented Oct 9, 2017 •

edited