Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for node power management with cluster create and delete #146

Closed
pokearu opened this issue Mar 13, 2022 · 3 comments
Closed

Support for node power management with cluster create and delete #146

pokearu opened this issue Mar 13, 2022 · 3 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@pokearu
Copy link
Contributor

pokearu commented Mar 13, 2022

When clusters are created/deleted with CAPT, the bare metal nodes have to be manually powered on and set to PXE boot for cluster create and need to be manually powered off after cluster delete to completely delete the cluster.

Expected Behaviour

  1. When the hardware CRDs are applied and those hardware CRDs are picked for a Machine, CAPT can perform an extra step to power on the nodes being made part of the cluster.
  2. During cluster deletion, once a hardware CRD is released, i.e. its not tagged to a Machine. CAPT can set the next boot order to PXE and power off the nodes.

Current Behaviour

  1. Currently we manually power on nodes for cluster creation.
  2. During delete, the nodes are left powered on and only the resources (templates, workflows) are deleted. Thus the cluster is still reachable since the API server is running.

Possible Solution

An integration with a BMC power management service like pbnj would help automate power on/off for CAPT.

@detiber detiber self-assigned this Mar 15, 2022
@crayzeigh
Copy link

#147 was held up on workflow approval, that's been pushed through and likely will need review once completed

@chrisdoherty4 chrisdoherty4 added the kind/feature Categorizes issue or PR as related to a new feature. label May 3, 2022
@chrisdoherty4
Copy link
Member

#147 still held up. Awaiting response from author.

@chrisdoherty4
Copy link
Member

chrisdoherty4 commented Jun 14, 2022

We're not integrating PBnJ to CAPT as we're integrating Rufio. Will get the issue closed.

/close

@jacobweinstock jacobweinstock closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2022
mergify bot added a commit that referenced this issue Jun 15, 2022
## Description
The PR enables automated power on/off of nodes that are made part of the cluster using [Rufio BMCJobs](https://github.com/tinkerbell/rufio/blob/main/api/v1alpha1/bmcjob_types.go).

1. When a hardware is selected to be part of a cluster, a `BMCJob` is created to get it to a state ready for provisioning with Tinkerbell.
2. When a cluster is deleted, a `BMCJob` is created that would power off the hardware once its been released (owner labels removed).
3. The rufio controller looks for these jobs and executes the listed Tasks.

## Why is this needed
Tries to address the issues listed on #146. 
The issue mentions PBNJ but Rufio is a k8s controller and fits well with CAPT. 

## How Has This Been Tested?

1. Created a `KinD` cluster and installed `Rufio` and `CAPT`. 
2. Applied a cluster manifest as elaborated [QUICK-START.md ](https://github.com/tinkerbell/cluster-api-provider-tinkerbell/blob/main/docs/QUICK-START.md#create-your-first-workload-cluster)
3. Monitored the nodes to check power on and PXE booting.
4. kubectl delete cluster <cluster-name>
5. Monitored the nodes to check power off.

## How are existing users impacted? What migration steps/scripts do we need?
No impact on existing users. The `BMCJob` creates are skipped if the Tinkerbell hardware object on the cluster does not have a `.Spec.BMCRef`

## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
@displague displague added this to the v0.2 milestone Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

6 participants