Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support cilium cni #287

Merged
merged 21 commits into from
Jun 20, 2024
Merged

feat: support cilium cni #287

merged 21 commits into from
Jun 20, 2024

Conversation

okozachenko1203
Copy link
Member

@okozachenko1203 okozachenko1203 commented Jan 4, 2024

fix #124

@okozachenko1203 okozachenko1203 marked this pull request as draft January 4, 2024 14:22
@okozachenko1203
Copy link
Member Author

https://opendev.org/openstack/magnum/src/commit/6bb2c107ffe0b9c7a278856927d64bebde3d5c36/magnum/api/validation.py#L324
magnum api validator has the supported network driver list as a constant value.
There are 2 options to make this configurable.

  • get the supported supported network_driver list from the conf file as the allowed network_driver list
  • override validator at the magnum drivers, i.e. in mcapi driver, we can define the validator. It is reasonable because the driver is the implementor for conducting tasks

@okozachenko1203
Copy link
Member Author

@mnaser
Copy link
Member

mnaser commented Jan 10, 2024

@okozachenko1203 Maybe we can simplifying the upstream change to this only:

-    supported_network_drivers = ['flannel', 'calico']
+    supported_network_drivers = ['flannel', 'calico', 'cilium']

@okozachenko1203 okozachenko1203 changed the title feat: support cilium cin feat: support cilium cni Jan 15, 2024
@okozachenko1203
Copy link
Member Author

@okozachenko1203 Maybe we can simplifying the upstream change to this only:

-    supported_network_drivers = ['flannel', 'calico']
+    supported_network_drivers = ['flannel', 'calico', 'cilium']

sure, i made another patch along side
https://review.opendev.org/c/openstack/magnum/+/905427

@mnaser
Copy link
Member

mnaser commented Feb 15, 2024

@okozachenko1203 - good news, it seems upstreamed has merged the fix to allow it

@okozachenko1203
Copy link
Member Author

root@kube-csqp0-rmggl-5jjss:/home/ubuntu# kubectl get po -A -owide
NAMESPACE     NAME                                             READY   STATUS    RESTARTS   AGE     IP             NODE                                    NOMINATED NODE   READINESS GATES
kube-system   cilium-c47bm                                     1/1     Running   0          77s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   cilium-operator-68d4bbdf56-6jqh6                 1/1     Running   0          91s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   cilium-operator-68d4bbdf56-fjrxw                 1/1     Running   0          91s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   cilium-qjdb7                                     1/1     Running   0          91s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   coredns-5d78c9869d-8876d                         1/1     Running   0          2m20s   10.100.0.158   kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   coredns-5d78c9869d-nr8m7                         1/1     Running   0          2m20s   10.100.0.57    kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   csi-cinder-controllerplugin-cd7ffbdf9-rl2fx      6/6     Running   0          91s     10.100.1.185   kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   csi-cinder-nodeplugin-8fv7r                      3/3     Running   0          77s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   csi-cinder-nodeplugin-llvr2                      3/3     Running   0          91s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   csi-nfs-controller-54fb58b59f-mlbvp              3/3     Running   0          89s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   csi-nfs-node-bpwxr                               3/3     Running   0          77s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   csi-nfs-node-g9sql                               3/3     Running   0          89s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   etcd-kube-csqp0-rmggl-5jjss                      1/1     Running   0          2m20s   10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   k8s-keystone-auth-v8m5c                          1/1     Running   0          52s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   kube-apiserver-kube-csqp0-rmggl-5jjss            1/1     Running   0          2m20s   10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   kube-controller-manager-kube-csqp0-rmggl-5jjss   1/1     Running   0          2m20s   10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   kube-proxy-4sxm9                                 1/1     Running   0          77s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   kube-proxy-shsfz                                 1/1     Running   0          2m20s   10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   kube-scheduler-kube-csqp0-rmggl-5jjss            1/1     Running   0          2m20s   10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   openstack-cloud-controller-manager-clqxr         1/1     Running   0          62s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>
kube-system   openstack-manila-csi-controllerplugin-0          4/4     Running   0          90s     10.100.1.149   kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   openstack-manila-csi-nodeplugin-5fwpt            2/2     Running   0          77s     10.0.0.139     kube-csqp0-default-worker-bhfkv-qhg8l   <none>           <none>
kube-system   openstack-manila-csi-nodeplugin-p9mgp            2/2     Running   0          90s     10.0.0.247     kube-csqp0-rmggl-5jjss                  <none>           <none>

Copy link
Member

@mnaser mnaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase this but also add Zuul jobs to test for both Calico and Cilium please?

@mnaser
Copy link
Member

mnaser commented Apr 5, 2024

@okozachenko1203 can you look into the failures?

@okozachenko1203
Copy link
Member Author

Failed tests:
[It] [sig-network] Services should have session affinity work for service with type clusterIP [LinuxOnly] [Conformance]
[It] [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]
[It] [sig-network] Services should be able to switch session affinity for service with type clusterIP [LinuxOnly] [Conformance]

@okozachenko1203
Copy link
Member Author

okozachenko1203 commented Apr 9, 2024

@okozachenko1203
Copy link
Member Author

@mnaser now 1 test case is failing in conformance test and it is an upstream issue cilium/cilium#14287

@mnaser
Copy link
Member

mnaser commented Apr 10, 2024

@okozachenko1203 Can you try and see if we enable the kube-proxy replcement mode will pass all tests?

@okozachenko1203
Copy link
Member Author

@okozachenko1203 Can you try and see if we enable the kube-proxy replcement(KPR) mode will pass all tests?

@mnaser this means we have to provision the capi cluster without kube-proxy first. Then, enable kube-proxy replacement mode in the cilium chart's values.
Ok, I can see CABPK supports this already kubernetes-sigs/cluster-api#5993
Btw, do we have to expose a new config param to reflect enablement of KPR mode?

@okozachenko1203
Copy link
Member Author

following this guide, we need to specify kube api server ip and port as helm values. But I don't think we can get them in advance.
@mnaser what do you think?
image

@mnaser
Copy link
Member

mnaser commented Apr 11, 2024

@okozachenko1203 hmm, that would be quite the challenge to do this in that case, because I think this is quite a little more involved than "just a small change of Helm values" to enable this option.

@mnaser
Copy link
Member

mnaser commented Apr 11, 2024

@okozachenko1203 Can you see how difficult it might be to enable Cilium to make this change happen with kube-proxy replacement out of the box? I think we'll have to look into some sort of 'reconcilation' loop..

@okozachenko1203
Copy link
Member Author

@okozachenko1203 Can you see how difficult it might be to enable Cilium to make this change happen with kube-proxy replacement out of the box? I think we'll have to look into some sort of 'reconcilation' loop..

yeah, provisioning the cluster without kube-proxy will be simple by leveraging skipPhase option.
Then we have to reapply resourceset configmap for cilium chart after fetching the cluster's kube-api ip and port, and i think this should be tricky

Copy link
Member

@mnaser mnaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cni-chaining-mode: portmap
enable-session-affinity: 'true'

@okozachenko1203 can you just use these instead and keep things as the normal system?

reference: cilium/cilium#14287 (comment)

For users who run with kube-proxy (i.e. with Cilium's kube-proxy replacement
disabled), the ClusterIP service loadbalancing when a request is sent from a pod
running in a non-host network namespace is still performed at the pod network
interface (until cilium/cilium#16197 is fixed).
For this case the session affinity support is disabled by default.
Failure of `[sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]` is expected until cilium/cilium#14287 is fixed
@okozachenko1203
Copy link
Member Author

cni-chaining-mode: portmap
enable-session-affinity: 'true'

@okozachenko1203 can you just use these instead and keep things as the normal system?

reference: cilium/cilium#14287 (comment)

with this portmap mode, sonobuoy passed for cilium cni

@mnaser mnaser merged commit 4f922d0 into main Jun 20, 2024
11 of 12 checks passed
@mnaser mnaser deleted the support-cilium branch June 20, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Cilium as network-plugin
2 participants