Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming operator component configuration changes #1990

Closed
lmm opened this issue Jun 1, 2022 · 14 comments
Closed

Upcoming operator component configuration changes #1990

lmm opened this issue Jun 1, 2022 · 14 comments
Labels
kind/enhancement New feature or request

Comments

@lmm
Copy link
Contributor

lmm commented Jun 1, 2022

Hey everyone!

We are planning on adding new fields to the operator API to allow more configuration of the resources managed by the operator. We would appreciate your feedback on these proposed changes.

Why?

The goal of these proposed changes is to meet some of the use cases highlighted in these GitHub issues:

The proposed changes

We will add new fields to allow configuring a component resource (such as calico node daemonset, calico kube-controllers deployment, etc). These fields will include (depending on the component):

  • annotations and labels
  • node affinity and node selectors
  • tolerations
  • minReadySeconds
  • container resources

We will start by adding configuration to the core Calico components:

  • calico-node (calico-node daemonset)
  • calico-typha (calico-typha deployment)
  • calico-kube-controllers (calico-kube-controllers deployment)
  • calico-apiserver (calico-apiserver deployment)

The new fields will be backed by types that closely resemble the upstream Kubernetes resources (daemonset, deployment) that are being configured. The core Calico components calico-node, calico-typha, and calico-kube-controllers are managed and configured in the Installation CRD, so the new fields for these components will be added to the InstallationSpec.
The new field to configure the calico-apiserver component will be added to the APIServerSpec. (In general, the new fields will be added to the component CRD that manages/configures those resources.)

Here is an example Installation resource with these new fields (note: we have not ironed out the granular details of the API changes such as field names, etc.):

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNodeTemplate:
    metadata:
      labels:
        daemonset-label: value
      annotations:
        daemonset-annot: value
    spec:
      minReadySeconds: 10
      podTemplate:
        metadata:
          labels:
            podtemplate-label: value-two
          annotations:
            foo: bar
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values:
                    - linux
          nodeSelector:
            kubernetes.io/hostname: testnode
          tolerations:
          - key: "example-key"
            operator: "Exists"
            effect: "NoSchedule"
          containers:
          - name: calico-node
            resources:
              requests:
                cpu: 100m
                mem: 1G
              limits: 
                cpu: 100m
                mem: 1G
  kubeControllersTemplate:
    spec:
      podTemplate:
        metadata:
          labels:
            foo: bar
        spec:
          nodeSelector:
            node-role.kubernetes.io/control-plane: “true”
  typhaTemplate:
    spec:
      podTemplate:
        spec:
          affinity: 
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                     nodeSelectorTerms:
                     - matchExpressions:
                       - key: kubernetes.azure.com/agentpool
                         operator: In
                         values:
                         - nodepool1

In rolling out these new fields, we will deprecate the following fields in the InstallationSpec:

  • componentResources (because the new fields provide a way to configure container resources for the core Calico components)
  • typhaAffinity (because the new fields will allow customizing Typha's affinity and more)

In the near future, we will add support for component resource configuration in the same way for all other components managed by the operator (such as ALP and Calico Enterprise components).

Feedback

Please let us know what you think of these changes!

@aquam8
Copy link

aquam8 commented Jun 7, 2022

That's brilliant @lmm ! That covers the strategic properties that need to be controlled by end-users.

The only thing i was wondering is the impact on the existing controlPlaneTolerations property.

Thanks!

@alzabo
Copy link

alzabo commented Jun 7, 2022

This looks great! Thanks!

@nuriel77
Copy link

nuriel77 commented Jun 7, 2022

Very nice!

Question:
Under calicoNodeTemplate shouldn't the affinity, nodeSelector and tolerations go under the podTemplate.spec? (That seems to be the case with kubeControllersTemplate and typhaTemplate)

@bcbrockway
Copy link

I like the approach but we would reeeaaaally like to have topologySpreadConstraints included in this change. We've switched our whole estate over to this from antiAffinity because there were still cases where some pods could end up being scheduled on the same node (the latter only works on a best-effort basis whereas the former allows us to tell the scheduler not to schedule a pod at all until the cluster autoscaler has had time to spin up another node).

@stevehipwell
Copy link
Contributor

This looks great.

@lmm
Copy link
Contributor Author

lmm commented Jun 7, 2022

The only thing i was wondering is the impact on the existing controlPlaneTolerations property.

@aquam8 the current plan is to keep the existing controlPlane* fields. We haven't worked out all of the details yet with the new config. But one way this could work is that controlPlaneTolerations could be set and be used as the default. If the tolerations for a specific component resources is set, then that is used instead of controlPlaneTolerations.

@lmm
Copy link
Contributor Author

lmm commented Jun 7, 2022

Under calicoNodeTemplate shouldn't the affinity, nodeSelector and tolerations go under the podTemplate.spec? (That seems to be the case with kubeControllersTemplate and typhaTemplate)

@nuriel77 oops, yes you're right. Let me fix that.

Edit: thanks @nuriel77 - I've updated the example now.

@aquam8
Copy link

aquam8 commented Jun 7, 2022

Would you mind providing an example on how the new suggested properties would be into the Helm customization of the Operator (after deprecating componentResources)..

Today i have:

apiServer:
  enabled: false
installation:
  enabled: true
  kubernetesProvider: "EKS"
  componentResources:
    - componentName: Node
      resourceRequirements:
        requests:
          memory: "64Mi"
          cpu: "40m"
        limits:
          memory: "128Mi"
          # no cpu limits
    - componentName: Typha
      resourceRequirements:
        requests:
          memory: "64Mi"
          cpu: "40m"
        limits:
          memory: "96Mi"
          # no cpu limits
  controlPlaneTolerations:
    - effect: NoSchedule
      operator: Exists
      key: "dedicated"

@lmm
Copy link
Contributor Author

lmm commented Jun 7, 2022

We've switched our whole estate over to this from antiAffinity because there were still cases where some pods could end up being scheduled on the same node (the latter only works on a best-effort basis whereas the former allows us to tell the scheduler not to schedule a pod at all until the cluster autoscaler has had time to spin up another node).

@bcbrockway does antiAffinity with requiredDuringSchedulingIgnoredDuringExecution not work in your situation? I'd like to understand your use case.

@lmm
Copy link
Contributor Author

lmm commented Jun 7, 2022

Would you mind providing an example on how the new suggested properties would be into the Helm customization of the Operator (after deprecating componentResources)..

@aquam8 I think it would look something like this (I've translated your example to proposed format):

apiServer:
  enabled: false
installation:
  enabled: true
  kubernetesProvider: "EKS"
  calicoNodeTemplate:
    spec:
      podTemplate:
        spec:
          containers:
          - name: calico-node
            requests:
              memory: "64Mi"
              cpu: "40m"
            limits:
              memory: "128Mi"
              # no cpu limits
  typhaTemplate:
    spec:
      podTemplate:
        spec:
          containers:
          - name: calico-typha
            requests:
              memory: "64Mi"
              cpu: "40m"
            limits:
              memory: "96Mi"
              # no cpu limits
  controlPlaneTolerations:
    - effect: NoSchedule
      operator: Exists
      key: "dedicated"

@aquam8
Copy link

aquam8 commented Jun 7, 2022

Awesome @lmm ! Of course i would add the tolerations block in each podTemplate.spec when that becomes available through this PR.
Thank you!

@lmm
Copy link
Contributor Author

lmm commented Jul 22, 2022

The proposed changes have now been merged in #2063 (with some follow-up changes to come).

@mohamed-elmadeny
Copy link

which helm version reflects these changes?

@tmjd
Copy link
Member

tmjd commented Nov 16, 2022

The Calico v3.24 release has these changes. You can find the options in the Installation Reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants