Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add controller to antrea-agent for implementing Egress #2026

Merged
merged 2 commits into from Apr 7, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Apr 2, 2021

The controller watches Egress and EgressGroup API and calls openflow
client and route client to enforce an Egress.

Co-authored-by: ceclinux ceclinux@users.noreply.github.com

For #1924

@tnqn
Copy link
Member Author

tnqn commented Apr 2, 2021

@jianjuns I haven't rebased on main branch and added any unit test. Create this PR for you to review the main code and the e2e code earlier. Please ignore changes other than the below files:

pkg/agent/controller/egress/egress_controller.go
test/e2e/egress_test.go

I will keep updating while you are reviewing.

egressIPStates: map[string]*egressIPState{},
egressBindings: map[string]*egressBinding{},
localIPDetector: &localIPDetector{},
idAllocator: newIDAllocator(256),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a constant for the maximum number of SNAT IPs. I feel maybe 100 is good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. But since we allocate 8 bits for the mark, no harm to allow 255 IPs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is it consumes more mem.


type localIPDetector struct{}

func (d *localIPDetector) IsLocalIP(ip string) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start from this, but later we should add periodic IP detection for the case SNAT IP is added after Egress is received.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought about it. We could use "ip monitor address" to watch the IP Address event so that we don't need to list addresses everytime we check if an IP is local or not, and don't need to do polling for IP change.
Will add a TODO if I can't make it in this release.

}
}
// Ensure datapath is configured properly.
if !ipState.datapathInstalled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, datapathInstalled is just to catch the previous failures in flow/rule installation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and to avoid extra external calls in later syncs.

if err := c.ofClient.InstallSNATMarkFlows(ipState.egressIP, ipState.mark); err != nil {
return 0, fmt.Errorf("error installing SNAT mark flows for IP %s: %v", ipState.egressIP, err)
}
if err := c.routeClient.AddSNATRule(ipState.egressIP, ipState.mark); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem to call InstallSNATMarkFlows() again in retry, after AddSNATRule() fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought adding same openflow rule is idempotent, no? or I could have two flags to record state of snat rule and iptables rule separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVS side should be idempotent. But how about our flow cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updating flow cache just overrides it so should be ok. Anyway I switched to 2 separate flags to not have assumption on the implementation.


// getOrAllocateMark gets or allocates a mark for a local Egress. Multiple Egresses can share same Egress IP.
// The mark is per Egress IP, not per Egress.
func (c *Controller) getOrAllocateMark(egressName, egressIP string) (uint32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func also installs flows and iptables rule. Call it realizeEgress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

return true
}

// getOrAllocateMark gets or allocates a mark for a local Egress. Multiple Egresses can share same Egress IP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same -> the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we install flows/iptables rule in the func, add comments about these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

}
// Ensure datapath is configured properly.
if !ipState.datapathInstalled {
if err := c.ofClient.InstallSNATMarkFlows(ipState.egressIP, ipState.mark); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we install flows / rule in the caller with the lock released?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may introduce extra external calls. For example, if two new Egresses have same Egress IP, and the first one doesn't configure the datapath with the lock, the other one will also try to configure the datapath, unless we introduce per-IP locks. However, even if we add per-IP locks, at least iptables calls will be serial anyway. Do you think it's worthwhile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-IP lock will be necessary as there could be race condition when an Egress starts to use an EgressIP while another Egress stops to use the same EgressIP.

return nil
}
if ipState.mark != 0 {
if err := c.ofClient.UninstallSNATMarkFlows(ipState.mark); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - can we release lock before uninstall flows/rule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return nil
}

func (c *Controller) getEgressState(egressName string) (*egressState, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you have considered this, but just a remainder - make sure access to the returned EgressState without lock, will not create any sync issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have considered this. The mutex is to protect the map, not the egressState item. The workqueue guarantees an Egress will only be processed by a single worker at any time. So the returned EgressState has no race condition. Maybe I could add this in the comment to be clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments will be good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the comment of the mutex.

Copy link
Member Author

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns thanks for the review


type localIPDetector struct{}

func (d *localIPDetector) IsLocalIP(ip string) (bool, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought about it. We could use "ip monitor address" to watch the IP Address event so that we don't need to list addresses everytime we check if an IP is local or not, and don't need to do polling for IP change.
Will add a TODO if I can't make it in this release.

egressIPStates: map[string]*egressIPState{},
egressBindings: map[string]*egressBinding{},
localIPDetector: &localIPDetector{},
idAllocator: newIDAllocator(256),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. But since we allocate 8 bits for the mark, no harm to allow 255 IPs?

return true
}

// getOrAllocateMark gets or allocates a mark for a local Egress. Multiple Egresses can share same Egress IP.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


// getOrAllocateMark gets or allocates a mark for a local Egress. Multiple Egresses can share same Egress IP.
// The mark is per Egress IP, not per Egress.
func (c *Controller) getOrAllocateMark(egressName, egressIP string) (uint32, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

}
}
// Ensure datapath is configured properly.
if !ipState.datapathInstalled {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and to avoid extra external calls in later syncs.

}
// Ensure datapath is configured properly.
if !ipState.datapathInstalled {
if err := c.ofClient.InstallSNATMarkFlows(ipState.egressIP, ipState.mark); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may introduce extra external calls. For example, if two new Egresses have same Egress IP, and the first one doesn't configure the datapath with the lock, the other one will also try to configure the datapath, unless we introduce per-IP locks. However, even if we add per-IP locks, at least iptables calls will be serial anyway. Do you think it's worthwhile?

if err := c.ofClient.InstallSNATMarkFlows(ipState.egressIP, ipState.mark); err != nil {
return 0, fmt.Errorf("error installing SNAT mark flows for IP %s: %v", ipState.egressIP, err)
}
if err := c.routeClient.AddSNATRule(ipState.egressIP, ipState.mark); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought adding same openflow rule is idempotent, no? or I could have two flags to record state of snat rule and iptables rule separately.

return nil
}
if ipState.mark != 0 {
if err := c.ofClient.UninstallSNATMarkFlows(ipState.mark); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return nil
}

func (c *Controller) getEgressState(egressName string) (*egressState, bool) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have considered this. The mutex is to protect the map, not the egressState item. The workqueue guarantees an Egress will only be processed by a single worker at any time. So the returned EgressState has no race condition. Maybe I could add this in the comment to be clear.

type localIPDetector struct{}

// IsLocalIP checks if the provided IP is configured on the Node.
// TODO: Instead of listing all IP addresses every time, it can maintain a cache and subscribe the IP address change to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need notification to controller for changes too, so controller can handle IP add/delete and update DP correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found golang API to watch ip address, will update this interface soon.

// Disable resyncing.
resyncPeriod time.Duration = 0
// maxEgressIPsPerNode is the maximum number of Egress IPs can be configured on a Node.
maxEgressIPsPerNode = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0 is for default IP, so the max number should be 255?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix. And switch to a lazy initialization mode to save memory

return nil
}

func (c *Controller) getEgressState(egressName string) (*egressState, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments will be good.

@tnqn tnqn force-pushed the egress-agent branch 3 times, most recently from 766c3bf to 5fdb6d2 Compare April 5, 2021 18:11
@tnqn
Copy link
Member Author

tnqn commented Apr 5, 2021

/test-all

@jianjuns
Copy link
Contributor

jianjuns commented Apr 6, 2021

@tnqn : is this PR ready for final review and merge (it still has WIP in title)?

@tnqn tnqn changed the title WIP: Add controller to antrea-agent for implementing Egress Add controller to antrea-agent for implementing Egress Apr 6, 2021
@tnqn
Copy link
Member Author

tnqn commented Apr 6, 2021

@jianjuns It's ready for review. I have made all changes except adding the verification in CRD schema. I have to add a simple one to enable e2e tests. Do you want me to add a comprehensive one in this PR? or you want to have another PR?

@jianjuns
Copy link
Contributor

jianjuns commented Apr 6, 2021

We can do another PR.

@tnqn
Copy link
Member Author

tnqn commented Apr 6, 2021

/test-all

@codecov-io
Copy link

codecov-io commented Apr 7, 2021

Codecov Report

Merging #2026 (2926972) into main (5752f96) will decrease coverage by 6.35%.
The diff coverage is 37.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2026      +/-   ##
==========================================
- Coverage   61.57%   55.22%   -6.36%     
==========================================
  Files         265      268       +3     
  Lines       19770    20177     +407     
==========================================
- Hits        12174    11142    -1032     
- Misses       6333     7850    +1517     
+ Partials     1263     1185      -78     
Flag Coverage Δ
kind-e2e-tests 41.20% <0.00%> (-11.67%) ⬇️
unit-tests 41.58% <37.62%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...agent/controller/egress/local_ip_detector_linux.go 0.00% <0.00%> (ø)
pkg/features/antrea_features.go 16.66% <ø> (ø)
pkg/agent/controller/egress/egress_controller.go 40.79% <40.79%> (ø)
pkg/agent/controller/egress/id_allocator.go 100.00% <100.00%> (ø)
pkg/apis/controlplane/helper.go 25.00% <0.00%> (-75.00%) ⬇️
pkg/apis/controlplane/v1beta2/helper.go 25.00% <0.00%> (-75.00%) ⬇️
pkg/controller/networkpolicy/mutate.go 0.00% <0.00%> (-71.77%) ⬇️
pkg/controller/networkpolicy/store/group.go 5.00% <0.00%> (-70.00%) ⬇️
pkg/controller/networkpolicy/validate.go 0.00% <0.00%> (-69.47%) ⬇️
pkg/k8s/name.go 33.33% <0.00%> (-66.67%) ⬇️
... and 67 more

jianjuns
jianjuns previously approved these changes Apr 7, 2021
test/e2e/egress_test.go Show resolved Hide resolved
The controller watches Egress and EgressGroup API and calls openflow
client and route client to enforce an Egress.

Co-authored-by: ceclinux <ceclinux@users.noreply.github.com>
@tnqn
Copy link
Member Author

tnqn commented Apr 7, 2021

/test-all

@tnqn
Copy link
Member Author

tnqn commented Apr 7, 2021

@jianjuns I verified the feature on IPv6 cluster and it worked. Will make the e2e test support IPv6 with another PR.

@tnqn
Copy link
Member Author

tnqn commented Apr 7, 2021

/test-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know IPv6 works too!

@tnqn tnqn merged commit e36c4f2 into antrea-io:main Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants