Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

2924: Preserve the client source IP #3298

Merged
merged 13 commits into from
Jun 11, 2018

Conversation

brb
Copy link
Contributor

@brb brb commented May 13, 2018

This PR implements a mechanism for preserving the client source IP addr when accessing a Service annotated with service.spec.externalTrafficPolicy=Local.

The idea for preserving the source IP addr has been presented by #2924 (comment).

If the --no-masq-local flag is passed to weaver, it will dynamically maintain the weave-no-masq-local ipset containing CIDRs allocated by IPAM for the particular weaver. The ipset is used by the iptables rule: iptables -t nat -I WEAVE 1 -m set --match-set weave-no-masq-local dst -j RETURN. The rule prevents external traffic to be MASQUERADEd before being sent to a Pod.

I suggest to read each commit separately when reviewing this PR.

Fixes #2924.

@brb brb added this to the 2.4 milestone May 13, 2018
@brb brb force-pushed the issues/2924-preserve-src-ip-on-local branch from bff0984 to 9792df4 Compare May 14, 2018 12:03
@brb brb requested a review from bboreham May 22, 2018 14:31
@bboreham
Copy link
Contributor

Just trying to understand this, without having been immersed in the details lately, I think the following explanation is useful:

Consider a packet arriving on node A from source S destined for pod P.
If P is on node A, then the reply can be sent straight to S. (however it got here, Linux and/or the network must know how to get it back)
If P is on node B, then a reply from B to S will be unexpected and/or not work.
Masquerading the packet as coming from node A means the reply from P on B will come via A, which is good.
So, this PR attempts to non-masquerade all cases where P is on A while still masquerading the cases where P is on B.

@brb can you confirm?

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok, if I understood the overall point.
A few comments below.

@@ -529,3 +537,27 @@ func ensureRulesAtTop(table, chain string, rulespecs [][]string, ipt *iptables.I

return nil
}

func reexpose(config *BridgeConfig, log *logrus.Logger) error {
// Get existing IP addrs of the weave bridge.

This comment was marked as abuse.

net/bridge.go Outdated
@@ -463,7 +469,10 @@ func configureIPTables(config *BridgeConfig) error {

if !config.NPC {
// Create a chain for allowing ingress traffic when the bridge is exposed
_ = ipt.NewChain("filter", "WEAVE-EXPOSE")
if err := ipt.ClearChain("filter", "WEAVE-EXPOSE"); err != nil {

This comment was marked as abuse.

net/bridge.go Outdated

return ipt.AppendUnique("nat", "POSTROUTING", "-j", "WEAVE")
// k8s-only: Create the ipset to store CIDRs allocated by IPAM for local pods.

This comment was marked as abuse.

@@ -56,7 +57,7 @@ func resetIPTables(ipt *iptables.IPTables) error {
}

func resetIPSets(ips ipset.Interface) error {
// Remove ipsets prefixed `weave-` only
// Remove ipsets prefixed `weave-` excluding weave-no-masq-local (used by weaver).

This comment was marked as abuse.

This comment was marked as abuse.

net/bridge.go Outdated
if err := ips.Create(NoMasqLocalIpset, ipset.HashNet); err != nil {
return err
}
if err := ipt.Insert("nat", "WEAVE", 1, "-m", "set", "--match-set", string(NoMasqLocalIpset), "dst", "-j", "RETURN"); err != nil {

This comment was marked as abuse.

@@ -199,6 +199,6 @@ func (alloc *Allocator) HandleHTTP(router *mux.Router, defaultSubnet address.CID
})

router.Methods("GET").Path("/ipinfo/tracker").HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, tracker)
fmt.Fprintf(w, alloc.tracker.String())

This comment was marked as abuse.

This comment was marked as abuse.

return "no-masq-local"
}

func (t *NoMasqLocalTracker) HandleUpdate(prevRanges, currRanges []address.Range, local bool) error {

This comment was marked as abuse.

This comment was marked as abuse.

net/bridge.go Outdated

func NewNoMasqLocalTracker() *NoMasqLocalTracker {
return &NoMasqLocalTracker{
// TODO(mp) Reuse ipset object as each time calling New creates a test set.

This comment was marked as abuse.

This comment was marked as abuse.

@@ -186,6 +202,7 @@ func main() {
http.HandleFunc("/read", state.serveRead)
http.HandleFunc("/write", state.serveWrite)
http.HandleFunc("/status", state.serveStatus)
http.HandleFunc("/client_ip", state.serveClientIP)

This comment was marked as abuse.


CLIENT_IP_MASQ="$($SSH $HOST1 curl -sS http://$HOST2:31138/client_ip)"

WEAVE_ENV_VARS="${WEAVE_ENV_VARS}\\n - name: NO_MASQ_LOCAL\\n value: \"1\""

This comment was marked as abuse.

This comment was marked as abuse.

@brb brb force-pushed the issues/2924-preserve-src-ip-on-local branch 2 times, most recently from 20c1526 to ef42d9b Compare June 4, 2018 11:13
brb added 11 commits June 4, 2018 14:06
This ensures that all exposing-related iptables rules are created.

As we do the re-expose, we need to clear the WEAVE-EXPOSE chain
if such exists. Thus, we use ClearChain instead of NewChain to
create the chain.
It's going to be used by net/bridge.go.
The ipset is used to track locally allocated CIDRs. If requested,
external traffic sent to pods within these CIDR ranges will
avoid SNAT'ing (NYI).
Cleaner approach than passing a tracker name to ipam/http.
It's going to be used by NoMasqLocalTracker.
The tracker updates the "no-masq-local" ipset with CIDR ranges allocated
by IPAM.
One can enable the feature by setting NO_MASQ_LOCAL=1 among
env vars of the weave-kube DaemonSet.
The endpoint is going to be used by --no-masq-local integration tests.
Also, increase timeout for smoke tests to 360 sec.
@brb
Copy link
Contributor Author

brb commented Jun 4, 2018

@brb can you confirm?

ACK.

Additional useful info: as mentioned in the code, this works only on k8s, as in the vanilla Docker case, the Docker bridge acts as a default gw for a container, while in the k8s case - the weave bridge. So in the Docker case, if S sends a packet to P, a reply to it might be sent via different interface which can result in dropped reply due to rp_filter.

@brb brb self-assigned this Jun 4, 2018
@bboreham
Copy link
Contributor

bboreham commented Jun 4, 2018

It might be worth commenting that "Kubernetes only" is a shorthand for "containers where the Weave Net interface is the default gateway", since the latter can be achieved using vanilla Docker and weave attach, or via some other orchestrator using the Weave Net CNI plugin.

@bboreham
Copy link
Contributor

bboreham commented Jun 4, 2018

Is this PR ready for final review now?

- Track only local changes.
- Add comment to the iptables rule which prevents SNAT'ing.
- Rename the ipset to weaver-no-masq-local to prevent NPC from destroying it.
- Fix comment about create/flush WEAVE-EXPOSE.
- Add comment about reexpose being no-op if bridge hasn't been exposed.
- Add comment that the feature is not only for Kubernetes.
- Reuse the ipset instance.
@brb brb force-pushed the issues/2924-preserve-src-ip-on-local branch from ef42d9b to 4c13639 Compare June 4, 2018 17:28
@brb brb removed their assignment Jun 4, 2018
@brb brb force-pushed the issues/2924-preserve-src-ip-on-local branch from 4c13639 to 3daa028 Compare June 4, 2018 17:31
@brb
Copy link
Contributor Author

brb commented Jun 4, 2018

Thanks for the review. Yep, it's ready for final review. Two last commits are relevant ones.

As ipset.New(...) can be called by two processes (weaver and weave-npc)
at the same time, there is a possibility for a race when they both
check the comment support. To prevent the race, we append a random
nonce to the test ipset name.
@bboreham bboreham merged commit 812c582 into master Jun 11, 2018
@bboreham
Copy link
Contributor

Some discussion on Twitter about whether we should default this on. Thoughts?

@rade
Copy link
Member

rade commented Aug 31, 2018

Some discussion on Twitter about whether we should default this on.

See #3389

@brb
Copy link
Contributor Author

brb commented Sep 3, 2018

Some discussion on Twitter about whether we should default this on. Thoughts?

Could you link to the discussion?

We can make it default only for cases where the weave bridge is a default gw for containers (do we know any k8s deployment where it is not the case?).

@bboreham bboreham deleted the issues/2924-preserve-src-ip-on-local branch December 24, 2018 12:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants