Network Operator simplifies the use of AMD AINICs in Kubernetes environments. It manages all networking components required to enable RDMA workloads within a Kubernetes cluster.
- AMD networking drivers - provide support for AMD AINICs and enable advanced features such as RDMA, SR-IOV, and hardware acceleration for high-performance workloads.
- Kubernetes device plugins - expose AINIC hardware capabilities to containers, allowing workloads to directly access high-speed network interfaces with minimal overhead.
- Kubernetes secondary networks - integrate with CNI plugins (like Multus) to provide dedicated network paths for data-intensive or latency-sensitive applications, ensuring optimal performance and isolation.
- AMD Network Operator Controller
- Kernel Module Management Operator
- Node Feature Discovery Operator
- K8s Network Device Plugin
- K8s Network Node Labeller
- Device Metrics Exporter
- Multus CNI
- CNI Plugins
- Kubernetes v1.29.0+
- Helm v3.2.0+
kubectlCLI tool configured to access your cluster- Networking kernel modules installed on all nodes (modprobe br_netfilter vxlan)
- Install Flannel CNI (if not present) for inter-node communication between nodes
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml- Install the Network Operator using the Helm bundle
helm install amd-network-operator network-operator-helm-k8s-v1.0.0.tgz \
-n kube-amd-network \
--create-namespace \
--set kmm.enabled=falseThis command deploys the following components: Controller, Node Feature Discovery Operator and Multus CNI plugin
- Create CR to run Device Plugin, Exporter and CNI Plugins
If you are pulling images from a private registry, create a Kubernetes secret and update networkconfig.yaml to reference it.
Create the Docker registry secret:
kubectl create secret docker-registry my-secret \
--docker-server=docker.io \
--docker-username=<username> \
--docker-password=<password> \
-n kube-amd-networkApply your network configuration:
kubectl apply -f example/networkconfig.yamlAfter the Device Plugin is up and running, it will begin discovering NIC devices and reporting them to the kubelet.
- Create a Network Attachment Definition to assign requested device to a workload
kubectl apply -f example/amd-host-device.nadThis step defines a secondary network using a NetworkAttachmentDefinition, which ensures the requested NIC or vNIC device is assigned to the workload pod via Multus.
- Create a workload requesting for a nic/vnic
kubectl apply -f example/workload.yaml- Exec into the workload pods and run IB and RCCL tests between the nodes.
root@app:/tmp# ib_write_bw -d roce_ai1 -i 1 -n 1000 -F -a -x 1 -q 1
root@app:/tmp# /tmp/vf_rccl_run.sh- Network Operator:
docker pull amdpsdo/network-operator:network-operator-0.0.1-2 - K8s Network Device Plugin
docker pull amdpsdo/k8s-network-device-plugin:v1.0.0-beta.0 - Device Metrics Exporter:
docker pull amdpsdo/device-metrics-exporter:exporter-0.0.1-139 - CNI Plugins:
docker pull amdpsdo/cni-plugins:v1.0.0-beta.0 - RCCL & IB Workload:
docker pull docker.io/rocm/roce-workload:ubuntu24_rocm7_rccl-J13A-1_anp-v1.1.0-4D_ainic-1.117.1-a-63
- IPAM plugin used to allocate and manage IP addresses for pods in a cluster in a way that avoids IP conflicts.
- Clone the whereabouts repository and apply necessary k8s manifests:
git clone https://github.com/k8snetworkplumbingwg/whereabouts && cd whereabouts
kubectl apply \
-f doc/crds/daemonset-install.yaml \
-f doc/crds/whereabouts.cni.cncf.io_ippools.yaml \
-f doc/crds/whereabouts.cni.cncf.io_overlappingrangeipreservations.yaml- RDMA CNI plugin allows network namespace isolation for RDMA workloads in a containerized environment.
- Set RDMA subsystem namespace awareness mode to
exclusiveduring OS boot on all the worker nodes.
echo "options ib_core netns_mode=0" >> /etc/modprobe.d/ib_core.conf- Clone the RDMA CNI repository and apply necessary k8s manifests:
git clone https://github.com/k8snetworkplumbingwg/rdma-cni && cd rdma-cni
kubectl apply \
-f ./deployment/rdma-cni-daemonset.yaml- To verify that the RDMA CNI plugin is functioning correctly, run
rdma link showinside the workload pod. The output should display only the RDMA device assigned to the pod in theexclusivemode, as shown below:
root@workload:/tmp# rdma link show
link ionic_0/1 state ACTIVE physical_state LINK_UP netdev net1
root@workload:/tmp#In contrast, on a system configured in RDMA shared mode, the same command will list all available RDMA devices:
root@rccl-app-5f8b8dbddb-fgr6s:/tmp# rdma link show
link roceo3/1 state ACTIVE physical_state LINK_UP netdev net1
link rocep132s0/1 state ACTIVE physical_state LINK_UP
link rocep33s0f0/1 state ACTIVE physical_state LINK_UP
link rocep33s0f1/1 state ACTIVE physical_state LINK_UP
link ionic_0/1 state ACTIVE physical_state LINK_UP
root@rccl-app-5f8b8dbddb-fgr6s:/tmp#Please refer to cni for usage examples of these CNI plugins.
To run the tool and troubleshoot issues, execute the following command:
./tools/techsupport_dump.sh -w -o yaml <node-name>Please refer to the troubleshooting page here for more details.
The AMD Network Operator is licensed under the Apache License 2.0.