Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate PBNJ into Tinkerbell k8s model #121

Closed
pokearu opened this issue Mar 15, 2022 · 6 comments
Closed

Integrate PBNJ into Tinkerbell k8s model #121

pokearu opened this issue Mar 15, 2022 · 6 comments

Comments

@pokearu
Copy link

pokearu commented Mar 15, 2022

Currently PBNJ is a standalone service that performs power management operations. It would benefit to have a formal integration with the Tinkerbell stack with the changes for k8s resource model.

Expected Behaviour

When provisioning baremetal nodes using Tinkerbell, the pbnj component would be responsible for the power/boot management of the nodes. The hardware CRD can be extended to contain the necessary BMC information, that pbnj may leverage to perform actions. This would help power on nodes, create BMC users and setting boot options. Also opens a scope to deprovision nodes, perform reboots/resets etc.

Current Behaviour

Manual intervention is required for powering up baremetal nodes and setting the boot order to net boot for Tinkerbell provisioning.

Initial Ideas

These are some rough ideas that can be discussed and expanded to a more formal proposal.

PBNJ as k8s Service

Currently PBNJ is a GRPC service, this can be run on the k8s cluster along with all the other Tinkerbell components (Boots, Hegel). The PBNJ service would have read access to the Hardware CRDs to fetch the BMC information and perform actions.

PBNJ as a k8s Controller

PBNJ can be redesigned to be a k8s controller. The controller could watch Workflow CRDs and pickup tasks tagged to it and perform power management actions.

PBNJ as a Hub action

This idea is based off tink-worker, we could possibly have a long running pbnj-worker on the same cluster as the Tinkerbell stack. The pbnj-worker could run hub actions, which use PBNJ binary to perform power management tasks.

@micahhausler micahhausler mentioned this issue Apr 11, 2022
3 tasks
mergify bot added a commit that referenced this issue Apr 11, 2022
Signed-off-by: Micah Hausler <mhausler@amazon.com>

## Description

Enable verbose `make help` output and re-arrange some targets. This uses the help formatting from [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder/blob/028566615757f423b09872b18bab189a65de2b3d/testdata/project-v2/Makefile#L30-L32) 

Before
```
$ make help
buf-lint                       run linting
build                          compile the binary for the native OS
cover                          Run unit tests with coverage report
darwin                         complie for darwin
evans                          run evans grpc client
goimports-ci                   run goimports for ci
goimports                      run goimports
image                          make the Container Image
linux                          complie for linux
pbs-docker-image               generate container image for building protocol buffers 
pbs-docker                     generate go stubs from protocol buffers in a container
pbs-install-deps               locally install dependencies in order to generate go stubs from protocol buffers
pbs                            locally generate go stubs from protocol buffers
ruby-client-demo               run ruby client demo
run-image                      run PBnJ container image
run-server                     run server locally
test-ci                        run tests for ci and codecov
test-functional                run functional tests
test                           run tests
```

after
```
$ make help

Usage:
  make <target>
  help             Display this help.

Build
  darwin           complie for darwin
  linux            complie for linux
  build            compile the binary for the native OS
  image            make the Container Image

Development
  test             run tests
  test-ci          run tests for ci and codecov
  test-functional  run functional tests
  goimports-ci     run goimports for ci
  goimports        run goimports
  cover            Run unit tests with coverage report
  buf-lint         run linting
  run-server       run server locally
  pbs              locally generate go stubs from protocol buffers
  pbs-install-deps  locally install dependencies in order to generate go stubs from protocol buffers
  pbs-docker       generate go stubs from protocol buffers in a container
  pbs-docker-image  generate container image for building protocol buffers
  run-image        run PBnJ container image

Clients
  ruby-client-demo  run ruby client demo
  evans            run evans grpc client

```

## Why is this needed

Working on #121, the kube-bulider generated make targets can be viewed in their own sections 

## How Has This Been Tested?

N/A

## How are existing users impacted? What migration steps/scripts do we need?

No Change

## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
@pokearu
Copy link
Author

pokearu commented Apr 12, 2022

Upon consideration of the initial ideas, PBNJ as a k8s controller is the approach I wish to elaborate and push forward.

PBNJ as a k8s controller

In this approach we convert PBNJ into a k8s controller, that reconciles to perform desired PBNJ power/boot management actions.

Hardware CR changes

We require an initial update to the tinkerbell hardware CRD. The idea here is that the Hardware CR would have a reference to its corresponding BMC object.

type HardwareSpec struct {
...

    BmcRef BmcReference `json:"bmcRef,omitempty"`
}

BMC CRD

The PBNJ controller would be responsible for reconciling and maintaining the desired state of the BMC object on the cluster. The BMC object contains the required bmc information like host IP, vendor, etc. Along with the desired state of the BMC like Power, Boot preference, NTP etc.

PowerJob CRD

In addition to maintaining the desired state of the BMC, pbnj controller can perform a desired set of actions, as a one off job. The job may include tasks like Power Off -> Set one-time Net boot -> Power On -> Set persistent Disk boot. Once the job is complete, the controller brings the machines back to their desired state. This gives the clients the flexibility to power cycle or reset nodes for updates/maintenance.

The Client

In this approach, the client to the pbnj controller can either be an end user, who does kubectl apply of the BMC object to set the desired state for all BMC in a data center. Or automation like CAPT can create the necessary objects to get nodes to the desired power state for provisioning.

@chrisdoherty4
Copy link
Member

@jacobweinstock This probably deserves some labeling given we're pushing ahead.

@chrisdoherty4
Copy link
Member

Note the implementation isn't landing in pbnj, it'll be in its own repository. Currently thats the rufio repository but it may get renamed. This issue is probably worth leaving open until that work is complete just for tracking and linking purposes.

@displague
Copy link
Member

@chrisdoherty4 would you consider closing this now that https://github.com/tinkerbell/rufio has come along a bit further? Either way, perhaps offer a diff of the points raised in @pokearu's two comments that define the goals.

@chrisdoherty4
Copy link
Member

When running with a Kube back-end I'm not sure PBnJ makes sense because all the interactions its required for are handled by Rufio.

If we want to talk about changing to use Kubernetes back-end as the primary/only back-end then I suspect Rufio would only need integrating if users want to talk BMC with a request-response type API. This feels like a bigger discussion than this ticket and other issues in the Tinkerbell space have a similar commentary - lets chat at a community meeting.

@jacobweinstock
Copy link
Member

Closing this as github.com/tinkerbell/rufio is provides a Kubernetes based BMC service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants