Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update priority class logic #1473

Merged
merged 1 commit into from
Aug 23, 2021
Merged

Update priority class logic #1473

merged 1 commit into from
Aug 23, 2021

Conversation

stevehipwell
Copy link
Contributor

@stevehipwell stevehipwell commented Aug 23, 2021

Description

Use the Kubernetes system priority classes, system-node-critical & system-cluster-critical, for Calico pods so they have the highest possible priority.

I've fixed the metadata API for deployments in the render package where it was incorrectly set to v1 instead of the correct apps/v1 (this issue is also present in the controllers package). This was correctly set for some deployments but not all of them.

I've also normalised the K8s imports in the render packages as there were warnings showing up about duplicate imports.

Fixes #1428.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@CLAassistant
Copy link

CLAassistant commented Aug 23, 2021

CLA assistant check
All committers have signed the CLA.

@stevehipwell
Copy link
Contributor Author

/assign @tmjd

@stevehipwell
Copy link
Contributor Author

I can't see the errors in Semaphore due to a 404, I'm not sure if this is expected?

@tmjd
Copy link
Member

tmjd commented Aug 23, 2021

Unfortunately I don't know there is anything I can do to open up Semaphore access. I think you should be able to get the errors if you ran make test or the target that Semaphore is running is make ci but I think the test target would show the errors that are being hit.

@stevehipwell
Copy link
Contributor Author

Thanks @tmjd I'm unable to easily run make from my Windows box but I'm able to manually run the tests in the render package and hopefully it's just these which are broken.

@tmjd
Copy link
Member

tmjd commented Aug 23, 2021

Yeah the failures all look to be in pkg/render/render_test.go and pkg/render/node_test.go

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>
@stevehipwell
Copy link
Contributor Author

@tmjd the tests should be fixed now. Are you happy with my other changes to the imports and deployment API, nothing was "broken" but the code in the render package should now be more consistent.

@tmjd
Copy link
Member

tmjd commented Aug 23, 2021

I've been looking and the priority class changes look good.
I would have preferred the main changes and the import changes to at least be in a separate commit, if not a separate PR, since they aren't related but you don't need to split them (I'm just mentioning for any possible future PRs). If I thought we'd be picking these changes to previous releases I would request splitting the changes because additional changes would just make for more potential conflicts when picking changes.

One thing that I'm checking with some colleagues on, is to see if they have a preference on the move to v1 for the k8s.io/api/core/v1, because I think it would be more consistent with the other import names if we used corev1 but you've gone with v1 but I don't want to request that change if others don't agree. This isn't a big deal but if we're changing it I want to make sure it is the "preferred" value.

Copy link
Member

@tmjd tmjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've already switched to corev1, unless I misread the changes earlier.

@tmjd tmjd merged commit 3728854 into tigera:master Aug 23, 2021
@tmjd
Copy link
Member

tmjd commented Aug 23, 2021

Thank you @stevehipwell for this contribution.

@stevehipwell
Copy link
Contributor Author

Thanks for the quick review @tmjd. I looked a bit further into the codebase as switched v1 to corev1, which is my personal preference. Sorry about the single commit, some projects always want 1 per PR and I wasn't sure if you wanted separate PRs or not; noted for the future.

@stevehipwell
Copy link
Contributor Author

@tmjd I don't suppose there is a release date for the next version? Our clusters built with our new pattern that uses the tigera-operator are currently impacted by the low priority class on the calico-node daemonset.

@tmjd
Copy link
Member

tmjd commented Aug 23, 2021

I'm not sure when the next release will be, there was a release recently and we have one at least once a quarter, so the longest would be 3 months.
If you can't relieve the pressure on your cluster to ensure there is room, you could pick your changes in this PR onto the version of the operator and build a custom operator image to pick up those changes.

@stevehipwell stevehipwell deleted the use-system-priority-classes branch August 24, 2021 07:56
@stevehipwell
Copy link
Contributor Author

@tmjd I suspect that even with a split commit that this wouldn't be a candidate to be picked into the current release. Would it be possible to create a PR for the current release to use the new behaviour if an environment variable was set, and if so what would the release windows for something like that be?

The back story of this is that we have a TF module to build EKS clusters, used for a large number of clusters, which installs the common components into a system node group with a high resource pressure from system components with system-node-critical and system-cluster-critical priority classes causing the calico-node daemonset to be un-schedulable on these nodes.

@tmjd
Copy link
Member

tmjd commented Aug 24, 2021

Back porting a change and making it available through an env var or other config in a previous version is not something we would want to do. You can pick the changes into a branch based on the release you want to have updated and build an image and push it to docker hub then just switch the operator deployment you're using to use that image.

@stevehipwell I know you said you can't run make on your windows machine but you could run a Linux VM (locally or on a cloud) where you'd need to install a few dependencies and then you can do make image on your custom branch to create an image with these changes.

@stevehipwell
Copy link
Contributor Author

I thought that would be the case @tmjd. If I need an image I can create some tooling to build it, it's just that I'd rather not if at all possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Priority class calico-priority isn't high enough for the calico-node daemonset
4 participants