CRDs, controllers and libraries for WAO.
This repository contains CRDs, controllers and libraries for WAO. They are intended to be used with wao-metrics-adapter, wao-scheduler, etc.
Install CRDs and controllers.
kubectl apply -f https://github.com/waok8s/wao-core/releases/download/v1.30.0/wao-core.yaml
Wait for the pod to be ready.
kubectl wait pod $(kubectl get pods -n wao-system -l control-plane=controller-manager -o jsonpath="{.items[0].metadata.name}") -n wao-system --for condition=Ready
Before using NodeConfig[Template], you need to do the following.
- Make sure that your nodes have Redfish API enabled and inlet temperature sensors are available.
- Make sure that you have a differential pressure API server running.
- Make sure that you have a power consumption predictor running.
- (Optional) Make sure that your nodes have
/redfish/v1/Systems/{systemId}/MachineLearningModel
Redfish property. This property is currently not supported in most Redfish implementations, but you can use MLMM to provide it.
We provide tools to test these requirements are met. Try the following.
# CLI to get inlet_temp from Redfish server.
go run github.com/waok8s/wao-core/pkg/metrics/redfish/cmd/redfish_inlettemp_cli@HEAD -h
# CLI to get delta_p from DifferentialPressureAPI server.
go run github.com/waok8s/wao-core/pkg/metrics/dpapi/cmd/dpapi_deltap_cli@HEAD -h
# CLI to get power consumption from V2InferenceProtocol server.
go run github.com/waok8s/wao-core/pkg/predictor/v2inferenceprotocol/cmd/v2ip_pcp_cli@HEAD -h
# CLI to get power consumption predictor endpoint from Redfish server.
go run github.com/waok8s/wao-core/pkg/predictor/redfish/cmd/redfish_ep_cli@HEAD -h
Note
We also provide fake implementations for testing purposes. You can use them by setting type
to Fake
in NodeConfig[Template].
Then, you can configure your nodes with NodeConfig[Template]. See Configuration for details.
NodeConfig CRD is used to configure a node for other WAO components. It provides the following information:
- Metrics collector: how to collect metrics from the node.
- Predictor: how to predict power consumption of the node.
Here is an example.
Important
Currently, only "wao-system" namespace is supported for NodeConfig, NodeConfigTemplate and related Secrets. (This is due to RBAC.)
apiVersion: wao.bitmedia.co.jp/v1beta1
kind: NodeConfig
metadata:
name: worker-0
namespace: wao-system
spec:
nodeName: worker-0
metricsCollector:
inletTemp:
type: Redfish
endpoint: "https://10.0.0.100"
basicAuthSecret:
name: "worker-0-redfish-basicauth"
fetchInterval: 10s
deltaP:
type: DifferentialPressureAPI
endpoint: "http://10.0.0.1:5000"
fetchInterval: 10s
predictor:
powerConsumption:
type: V2InferenceProtocol
endpoint: "http://10.0.0.1:8080/v2/models/myModel/versions/v0.1.0/infer"
The above example uses Redfish and DifferentialPressureAPI to collect inlet temperature and differential pressure, and uses V2InferenceProtocol to predict power consumption.
This part of the spec is used to configure how to collect inlet temperature.
type
:Redfish
orFake
.endpoint
: Endpoint URL. Ignored whentype
isFake
.basicAuthSecret
(Optional): Secret containing username and password for basic authentication. Ignored when thetype
does not require authentication.fetchInterval
(Optional): Interval to fetch metrics. Default is15s
.
inletTemp:
type: Redfish
endpoint: "https://10.0.0.100"
basicAuthSecret:
name: "worker-0-redfish-basicauth"
fetchInterval: 10s
This part of the spec is used to configure how to collect differential pressure.
type
:DifferentialPressureAPI
orFake
.endpoint
: Endpoint URL. Ignored whentype
isFake
.basicAuthSecret
(Optional): Secret containing username and password for basic authentication. Ignored when thetype
does not require authentication.fetchInterval
(Optional): Interval to fetch metrics. Default is15s
.
deltaP:
type: DifferentialPressureAPI
endpoint: "http://10.0.0.1:5000"
fetchInterval: 10s
This part of the spec is used to configure how to predict power consumption.
type
:V2InferenceProtocol
orFake
.endpoint
: Endpoint URL. Ignored whentype
isFake
.basicAuthSecret
(Optional): Secret containing username and password for basic authentication. Ignored when thetype
does not require authentication.fetchInterval
(Unused): Ignored.
powerConsumption:
type: V2InferenceProtocol
endpoint: "http://10.0.0.1:8080/v2/models/myModel/versions/v0.1.0/infer"
This part of the spec is used to configure how to get endpoint for power consumption predictor. This is useful when the endpoint is described in Redfish or other APIs.
type
:Redfish
orFake
.endpoint
: Endpoint URL. Ignored whentype
isFake
.basicAuthSecret
(Optional): Secret containing username and password for basic authentication. Ignored when thetype
does not require authentication.fetchInterval
(Unused): Ignored.
powerConsumptionEndpointProvider:
type: Redfish
endpoint: "https://10.0.0.1"
basicAuthSecret:
name: "worker-0-redfish-basicauth"
If your predictor requires authentication, you can set your Secret in powerConsumption.basicAuthSecret
while leaving other fields empty.
powerConsumption:
type: "" # Endpoint provider will set this.
endpoint: "" # Endpoint provider will set this.
basicAuthSecret:
name: "predictor-basicauth"
powerConsumptionEndpointProvider:
type: Redfish
endpoint: "https://10.0.0.1"
basicAuthSecret:
name: "worker-0-redfish-basicauth"
NodeConfigTemplate CRD is used to configure a group of nodes by selecting nodes with labels. The controller will create NodeConfig for each node. Several fields can be templated and this is useful for configuring IP addresses, etc.
Here is an example.
apiVersion: wao.bitmedia.co.jp/v1beta1
kind: NodeConfigTemplate
metadata:
name: redfish-enabled-nodes
namespace: wao-system
spec:
nodeSelector:
matchLabels:
node.kubernetes.io/instance-type: "redfish-enabled"
template:
nodeName: "" # This will be set by the controller.
metricsCollector:
inletTemp:
type: Redfish
endpoint: "https://10.0.100.{{.IPv4.Octet4}}"
basicAuthSecret:
name: "redfish-basicauth-{{.Hostname}}"
fetchInterval: 10s
deltaP:
type: DifferentialPressureAPI
endpoint: "http://10.0.0.1:5000"
fetchInterval: 10s
predictor:
powerConsumptionEndpointProvider:
type: Redfish
endpoint: "https://10.0.100.{{.IPv4.Octet4}}"
basicAuthSecret:
name: "redfish-basicauth-{{.Hostname}}"
After applying the above NodeConfigTemplate, you can see NodeConfig for each node.
$ kubectl get nodeconfig -n wao-system
NAME AGE
redfish-enabled-nodes-worker-0 10s
redfish-enabled-nodes-worker-1 10s
redfish-enabled-nodes-worker-2 10s
You can use text/template
style syntax in type
endpoint
and basicAuthSecret.name
fields, and the following variables are available.
{{.Hostname}}
:kubernetes.io/hostname
label value.{{.IPv4.Address}}
: Address value of the firstInternalIP
in Node'sstatus.addresses
.{{.IPv4.Octet1}}
{{.IPv4.Octet2}}
{{.IPv4.Octet3}}
{{.IPv4.Octet4}}
: Octet value of the above address.
You can also use sprig
functions to do some magic. Examples here.
{{ trimPrefix .Hostname "worker-" }}
: Removeworker-
prefix from the hostname.{{ add .IPv4.Octet3 10 }}
: Add10
to the 3rd octet.
This project uses Kubebuilder (v3.11) to generate the CRDs and controllers. However, codes under pkg
(contain libraries) do not follow Kubebuilder conventions.
api/wao
: CRDs.internal/controller
: Controllers.pkg/controller
: Controllers not run in the controller manager.pkg/metrics
: Custom metrics library.pkg/predictor
: Predictor library.pkg/client
: Cached clients for metrics and predictors.
Versioning: we use the same major.minor as Kubernetes, and the patch is our own.
- 2024-05-17
v1.30.0
- Support Kubernetes v1.30.
- 2024-05-07
v1.29.0
- Support Kubernetes v1.29.
- 2024-03-29
v1.28.0
- Support Kubernetes v1.28.
- 2024-03-04
v1.27.0
- First release.
- CRDs, controllers and libraries.
This work is supported by the New Energy and Industrial Technology Development Organization (NEDO) under its "Program to Develop and Promote the Commercialization of Energy Conservation Technologies to Realize a Decarbonized Society" (JPNP21005).
Copyright 2023 Bitmedia, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.