Skip to content

feat: Add IPAM block cleanup for removed nodes #10428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Kevinz857
Copy link

This adds a new CleanupBlocksForRemovedNodes method to the IPAM interface and implements a new CLI command 'calicoctl ipam clean orphaned-blocks' to help administrators clean up IPAM blocks for nodes that no longer exist.

The implementation includes:

  • New interface method in libcalico-go/lib/ipam/interface.go
  • Implementation in libcalico-go/lib/ipam/cleanup.go
  • CLI command in calicoctl/calicoctl/commands/ipam/clean.go
  • Test support in kube-controllers/pkg/controllers/node/fake_ipam_cleaner.go

This resolves the issue of slow IPAM block release when nodes are removed from the cluster, providing a built-in alternative to external cleanup scripts.

Description

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

This adds a new CleanupBlocksForRemovedNodes method to the IPAM interface
and implements a new CLI command 'calicoctl ipam clean orphaned-blocks' to
help administrators clean up IPAM blocks for nodes that no longer exist.

The implementation includes:
- New interface method in libcalico-go/lib/ipam/interface.go
- Implementation in libcalico-go/lib/ipam/cleanup.go
- CLI command in calicoctl/calicoctl/commands/ipam/clean.go
- Test support in kube-controllers/pkg/controllers/node/fake_ipam_cleaner.go

This resolves the issue of slow IPAM block release when nodes are removed
from the cluster, providing a built-in alternative to external cleanup scripts.
@Kevinz857 Kevinz857 requested a review from a team as a code owner May 18, 2025 14:41
@marvin-tigera marvin-tigera added this to the Calico v3.31.0 milestone May 18, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels May 18, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@caseydavenport
Copy link
Member

@Kevinz857 thanks for the PR! I have a few questions I'd like to talk about before getting too deep into the review:

  • We do already a flow meant to do similar operations to this - an ipam check command that allows you to build a report on released resources, and then ipam release that allows you to release leaked objects from the report. Did you consider enhancing that to meet your use-case (if it doesn't already)?
  • Release of IPAM blocks when nodes are deleted is meant to happen automatically by Calico - presumably you are not seeing this behavior? If not, I would like to get to the root cause of that rather than rely on manual intervention.

@caseydavenport
Copy link
Member

This resolves the issue of slow IPAM block release when nodes are removed from the cluster

Is there a particular GitHub issue you are referring to?

@caseydavenport caseydavenport self-assigned this May 19, 2025
@Kevinz857
Copy link
Author

This resolves the issue of slow IPAM block release when nodes are removed from the cluster

Is there a particular GitHub issue you are referring to?

@caseydavenport about this issue #8548

We are a scenario where K8s nodes frequently scale. Here, we will find that the release speed of ipamblocks is very slow, which leads to insufficient ipamblocks. At this time, manual cleaning is often required

@caseydavenport
Copy link
Member

caseydavenport commented May 30, 2025

@Kevinmmt the linked issue is primarily about releasing handles automatically, which the calicoctl ipam release --from-report option can already do.

I just did some work in this area that should substantially improve IPAM garbage collection performance and speed: #7508 (comment)

Might be worth seeing if those changes fix your situation. If not, then I think the ipam release --from-report option is the direction I would point you. I'd like to funnel manual IPAM GC through a singular process to avoid proliferation of different commands and tools that do similar but not quite the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants