[Feature] [scheduler-plugins] Support second scheduler mode #3852

CheyuWu · 2025-07-09T14:42:58Z

Why are these changes needed?

Currently, KubeRay only supports scheduler plugins when it is deployed as a single scheduler.
This change adds support for using a secondary scheduler with scheduler-plugins

Manual Testing

Common Portion

Ray operator setup

Set helm-chart/kuberay-operator/values.yaml's batchScheduler.name to scheduler-plugins-scheduler

batchScheduler:
  enabled: false
  name: "scheduler-plugins-scheduler"

Testing YAML file

Create a yaml file - deploy.yaml

#### deploy.yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-kuberay
  labels:
    ray.io/gang-scheduling-enabled: "true"
    ray.io/scheduler-name: cheduler-plugins-scheduler
spec:
  rayVersion: '2.46.0'
  headGroupSpec:
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.46.0
          resources:
            limits:
              cpu: 1
              memory: 2G
            requests:
              cpu: 1
              memory: 2G
          ports:
          - containerPort: 6379
            name: gcs-server
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
  workerGroupSpecs:
  - replicas: 3
    minReplicas: 1
    maxReplicas: 5
    groupName: workergroup
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.46.0
          resources:
            limits:
              cpu: 1
              memory: 1G
            requests:
              cpu: 1
              memory: 1G

Single scheduler

CoScheduler setup

Follow the instruction - Reference

Some things that are different from instruction

Install vim in kube-scheduler-kind-control-plane
```
$ apt update
$ apt install vim
```
Fix the permission problem in kube-scheduler-kind-control-plane
```
$ chmod 644 /etc/kubernetes/scheduler.conf
```

Apply missing YAML

$ k apply -f manifests/crds/scheduling.x-k8s.io_elasticquotas.yaml

/etc/kubernetes/sched-cc.yaml

Keep both default-scheduler and scheduler-plugins-scheduler to make sure the ray-operator can be deployed.

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
leaderElection:
  # (Optional) Change true to false if you are not running a HA control-plane.
  leaderElect: true
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
- schedulerName: default-scheduler
  plugins:
    queueSort:
      enabled:
        - name: Coscheduling
      disabled:
        - name: PrioritySort
    multiPoint:
      enabled:
        - name: Coscheduling
- schedulerName: scheduler-plugins-scheduler
  plugins:
    queueSort:
      enabled:
        - name: Coscheduling
      disabled:
        - name: PrioritySort
    multiPoint:
      enabled:
      - name: Coscheduling

Apply `deploy.yaml`

Run Cmd to deploy raycluster with scheduler-plugins-scheduler and gang-scheduling-enabled

$ k apply -f deploy.yaml

Result

Get Status

$ k get raycluster

NAME                 DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
raycluster-kuberay   3                 3                   4      5G       0      ready    47s

$ k get podgroup raycluster-kuberay -o yaml

apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  creationTimestamp: "2025-07-13T10:28:09Z"
  generation: 1
  name: raycluster-kuberay
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    kind: RayCluster
    name: raycluster-kuberay
    uid: 2ec902e3-5d4f-4a82-b153-0ee088a8d1fe
  resourceVersion: "4685"
  uid: 9d59585d-a2a1-4523-8225-6df31b9eabd0
spec:
  minMember: 3
  minResources:
    cpu: "3"
    memory: 4G
status:
  occupiedBy: default/raycluster-kuberay
  phase: Running
  running: 4

Get scheduler Name - ray operator & ray head & ray worker

$ k get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.schedulerName}{"\n"}{end}'

kuberay-operator-5f997dbf6c-gf9g2       default-scheduler
raycluster-kuberay-head scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-44jd4     scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-cqbks     scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-nnl79     scheduler-plugins-scheduler

Modify the deploy.yaml and apply it

  workerGroupSpecs:
  - replicas: 100
    minReplicas: 1
    maxReplicas: 200

Run cmd to check is all of the pod in pending status

$ kubectl get pods -A

NAMESPACE            NAME                                            READY   STATUS    RESTARTS   AGE
default              kuberay-operator-5f997dbf6c-gf9g2               1/1     Running   0          51m
default              raycluster-kuberay-head                         0/1     Pending   0          2m58s
default              raycluster-kuberay-workergroup-worker-2h25p     0/1     Pending   0          2m57s
... -> skip lots of pending worker pods
default              raycluster-kuberay-workergroup-worker-xqvzv     0/1     Pending   0          2m57s
default              raycluster-kuberay-workergroup-worker-z4dhc     0/1     Pending   0          2m54s
default              raycluster-kuberay-workergroup-worker-zbl9f     0/1     Pending   0          2m54s
kube-system          coredns-6f6b679f8f-4fcbz                        1/1     Running   0          74m
kube-system          coredns-6f6b679f8f-v8fm6                        1/1     Running   0          74m
kube-system          etcd-kind-control-plane                         1/1     Running   0          74m
kube-system          kindnet-9p4st                                   1/1     Running   0          74m
kube-system          kube-apiserver-kind-control-plane               1/1     Running   0          74m
kube-system          kube-controller-manager-kind-control-plane      1/1     Running   0          74m
kube-system          kube-proxy-5xv2w                                1/1     Running   0          74m
kube-system          kube-scheduler-kind-control-plane               1/1     Running   0          57m
local-path-storage   local-path-provisioner-57c5987fd4-sfx5n         1/1     Running   0          74m
scheduler-plugins    scheduler-plugins-controller-845cfd89c6-vvg4p   1/1     Running   0          60m

Second scheduler

According to the instruction - Reference

Install the scheduler-plugins

$ helm install --repo https://scheduler-plugins.sigs.k8s.io scheduler-plugins scheduler-plugins

Check the scheduler-plugins is running

$ kubectl get deploy

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
kuberay-operator               1/1     1            1           19s
scheduler-plugins-controller   1/1     1            1           72s
scheduler-plugins-scheduler    1/1     1            1           72s

Ray operator and apply config

Set helm-chart/kuberay-operator/values.yaml batchScheduler.name to scheduler-plugins

batchScheduler:
  enabled: false
  name: "scheduler-plugins-scheduler"

Apply `deploy.yaml`

Run Cmd to deploy raycluster with scheduler-plugins-scheduler and gang-scheduling-enabled

$ k apply -f deploy.yaml

Result

Get Status

$ k get raycluster

NAME                 DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
raycluster-kuberay   3                 3                   4      5G       0      ready    8m41s

$ k get podgroup raycluster-kuberay -o yaml

apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  creationTimestamp: "2025-07-13T11:15:31Z"
  generation: 1
  name: raycluster-kuberay
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    kind: RayCluster
    name: raycluster-kuberay
    uid: 626e1351-ca01-4759-a3ec-96fb9747019c
  resourceVersion: "2760"
  uid: 802b40d9-3ec9-4234-aaa1-f4580189a403
spec:
  minMember: 4
  minResources:
    cpu: "4"
    memory: 5G
status:
  occupiedBy: default/raycluster-kuberay
  phase: Running
  running: 4

Get scheduler Name - ray operator & ray head & ray worker

$ k get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.schedulerName}{"\n"}{end}'

kuberay-operator-5f997dbf6c-mdj8c       default-scheduler
raycluster-kuberay-head scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-kgjbc     scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-rgml8     scheduler-plugins-scheduler
raycluster-kuberay-workergroup-worker-xlpwp     scheduler-plugins-scheduler
scheduler-plugins-controller-845cfd89c6-f7bqt   default-scheduler
scheduler-plugins-scheduler-5dd667cb77-99t6x    default-scheduler

Modify the deploy.yaml and apply it

  workerGroupSpecs:
  - replicas: 100
    minReplicas: 1
    maxReplicas: 200

Run cmd to check is all of the pod in pending status

$ kubectl get pods -A

NAMESPACE            NAME                                            READY   STATUS    RESTARTS   AGE
default              kuberay-operator-5f997dbf6c-mdj8c               1/1     Running   0          13m
default              raycluster-kuberay-head                         0/1     Pending   0          22s
default              raycluster-kuberay-workergroup-worker-24rss     0/1     Pending   0          15s
default              raycluster-kuberay-workergroup-worker-2l7cp     0/1     Pending   0          19s
default              raycluster-kuberay-workergroup-worker-2lrxx     0/1     Pending   0          15s
... -> skip lots of pending worker pods
default              raycluster-kuberay-workergroup-worker-xrw7w     0/1     Pending   0          15s
default              raycluster-kuberay-workergroup-worker-xz88j     0/1     Pending   0          16s
default              raycluster-kuberay-workergroup-worker-z98xm     0/1     Pending   0          22s
default              raycluster-kuberay-workergroup-worker-zl6qs     0/1     Pending   0          17s
default              raycluster-kuberay-workergroup-worker-zv4md     0/1     Pending   0          17s
default              scheduler-plugins-controller-845cfd89c6-f7bqt   1/1     Running   0          14m
default              scheduler-plugins-scheduler-5dd667cb77-99t6x    1/1     Running   0          14m
kube-system          coredns-6f6b679f8f-gjcvt                        1/1     Running   0          22m
kube-system          coredns-6f6b679f8f-ldrt2                        1/1     Running   0          22m
kube-system          etcd-kind-control-plane                         1/1     Running   0          22m
kube-system          kindnet-mpbzl                                   1/1     Running   0          22m
kube-system          kube-apiserver-kind-control-plane               1/1     Running   0          22m
kube-system          kube-controller-manager-kind-control-plane      1/1     Running   0          22m
kube-system          kube-proxy-ftjnh                                1/1     Running   0          22m
kube-system          kube-scheduler-kind-control-plane               1/1     Running   0          22m
local-path-storage   local-path-provisioner-57c5987fd4-w2sx9         1/1     Running   0          22m

Related issue number

Closes #3769

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>

CheyuWu · 2025-07-09T16:05:06Z

Hi @kevin85421, PTAL

kevin85421 · 2025-07-09T19:40:45Z

Why do you use single scheduler for manual test?

kevin85421 · 2025-07-09T19:41:22Z

cc @troychiu for review

CheyuWu · 2025-07-10T12:48:38Z

Why do you use single scheduler for manual test?

Hi @kevin85421
Although both default-scheduler and scheduler-plugins are configured in /etc/kubernetes/sched-cc.yaml
the Ray pods (head and workers) are explicitly assigned to the scheduler-plugins scheduler, as shown in:

labels:
  ray.io/scheduler-name: scheduler-plugins

and verified via:

$ kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.schedulerName}{"\n"}{end}'

This setup follows the multi-scheduler setup, where KubeRay operator itself is scheduled using default-scheduler, and RayCluster pods are scheduled using scheduler-plugins.

I’ll revise the wording in the PR description to avoid confusion around the "single scheduler" statement.

kevin85421

As I understand it, you deploy scheduler-plugins in "single scheduler" mode to replace the default scheduler. For "second scheduler" mode, you need to use the Helm chart to install scheduler-plugins in a separate Pod.

https://github.com/kubernetes-sigs/scheduler-plugins/blob/93126eabdf526010bf697d5963d849eab7e8e898/doc/install.md#as-a-second-scheduler

CheyuWu · 2025-07-10T17:02:16Z

As I understand it, you deploy scheduler-plugins in "single scheduler" mode to replace the default scheduler. For "second scheduler" mode, you need to use the Helm chart to install scheduler-plugins in a separate Pod.

https://github.com/kubernetes-sigs/scheduler-plugins/blob/93126eabdf526010bf697d5963d849eab7e8e898/doc/install.md#as-a-second-scheduler

Ops, I have a misunderstanding. I will use the second scheduler mode instead.

CheyuWu · 2025-07-11T07:53:04Z

Hi @kevin85421 @troychiu, I have updated the manual testing procedure, PTAL

CheyuWu · 2025-07-12T16:03:37Z

I have also updated the 100 pods manual testing, and all of them are in pending status

kevin85421 · 2025-07-12T16:46:59Z

I have also updated the 100 pods manual testing, and all of them are in pending status

Have you tested for both single scheduler and second scheduler for this 100 Pods RayCluster CR?

troychiu · 2025-07-13T05:30:35Z

ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go

@@ -90,8 +90,7 @@ func (k *KubeScheduler) AddMetadataToPod(_ context.Context, app *rayv1.RayCluste
 	if k.isGangSchedulingEnabled(app) {
 		pod.Labels[kubeSchedulerPodGroupLabelKey] = app.Name
 	}
-	// TODO(kevin85421): Currently, we only support "single scheduler" mode. If we want to support
-	// "second scheduler" mode, we need to add `schedulerName` to the pod spec.
+	pod.Spec.SchedulerName = k.Name()


should we change the name to scheduler-plugins-scheduler to match the default name in https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/manifests/install/charts/as-a-second-scheduler/values.yaml#L6C9-L6C36?

troychiu · 2025-07-13T05:35:56Z

ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins_test.go

+			if cluster.Labels == nil {
+				cluster.Labels = make(map[string]string)
+			}
+			if tt.enableGang {
+				cluster.Labels["ray.io/gang-scheduling-enabled"] = "true"
+			} else {
+				delete(cluster.Labels, "ray.io/gang-scheduling-enabled")
+			}


Will this be cleaner?

Suggested change

if cluster.Labels == nil {

cluster.Labels = make(map[string]string)

}

if tt.enableGang {

cluster.Labels["ray.io/gang-scheduling-enabled"] = "true"

} else {

delete(cluster.Labels, "ray.io/gang-scheduling-enabled")

}

cluster.Labels = make(map[string]string)

if tt.enableGang {

cluster.Labels["ray.io/gang-scheduling-enabled"] = "true"

}

troychiu · 2025-07-13T05:36:45Z

ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins_test.go

+			scheduler := &KubeScheduler{}
+			scheduler.AddMetadataToPod(context.TODO(), &cluster, "worker", pod)
+
+			if tt.expectedPodGroup {


Can we simply use enableGang instead of having another parameter? I think they have similar intention.

troychiu · 2025-07-13T05:38:17Z

As @kevin85421 mentioned, can you also double check if both modes work fine?

CheyuWu · 2025-07-13T13:00:19Z

@kevin85421 @troychiu ,

I have updated the Manual Testing portion for both the single scheduler and the second scheduler.
Use scheduler-plugin-scheduler instead
Fix the redundant parameter in the test

troychiu · 2025-07-13T22:39:19Z

ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go

@@ -21,7 +21,7 @@ import (
 )

 const (
-	schedulerName                 string = "scheduler-plugins"
+	schedulerName                 string = "scheduler-plugins-scheduler"


Do you mind adding a comment mentioning the source of the name? https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/manifests/install/charts/as-a-second-scheduler/values.yaml#L6C9-L6C36

Yes, this is important. I will add the comment.

troychiu · 2025-07-13T22:43:54Z

helm-chart/kuberay-operator/values.yaml

@@ -69,13 +69,13 @@ logging:
 #
 #  4. Use PodGroup
 #       batchScheduler:
-#         name: scheduler-plugins
+#         name: scheduler-plugins-scheduler


For user facing config, I am not sure if we should use "scheduler-plugins" or "scheduler-plugins-scheduler". Wdyt?

You are right, and it's easier to understand.

But, I think this is a little awkward, we cannot directly change GetPluginName, because

kuberay/ray-operator/controllers/ray/batchscheduler/schedulermanager.go

Line 63 in 7d856ed

case schedulerplugins.GetPluginName():

If we need to change batchScheduler to scheduler-plugins, the code will probably be

const ( schedulerName string = "scheduler-plugins" + defaultSchedulerName string = "scheduler-plugins-scheduler" kubeSchedulerPodGroupLabelKey string = "scheduling.x-k8s.io/pod-group" ) func GetPluginName() string { return schedulerName } func (k *KubeScheduler) Name() string { return defaultSchedulerName -> Is this fine to change something like this? }

I am not sure if there is a better idea

IMO, user experience is more important so this is fine to me. However, we'll need good variable naming and comments explaining why there are two names and their corresponding responsibility.

feat: add schedulerName to pod for second scheduler

ea71807

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>

CheyuWu force-pushed the feat/second-schedule branch from 09c3853 to ea71807 Compare July 9, 2025 14:52

kevin85421 reviewed Jul 10, 2025

View reviewed changes

troychiu reviewed Jul 13, 2025

View reviewed changes

CheyuWu added 2 commits July 13, 2025 19:31

fix: use default scheduler name

e9bcfec

fix: remove redundant parameters

4c2b363

troychiu reviewed Jul 13, 2025

View reviewed changes

[Feature] [scheduler-plugins] Support second scheduler mode #3852

Are you sure you want to change the base?

[Feature] [scheduler-plugins] Support second scheduler mode #3852

Conversation

CheyuWu commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Manual Testing

Common Portion

Ray operator setup

Testing YAML file

Single scheduler

CoScheduler setup

Apply deploy.yaml

Result

Second scheduler

Ray operator and apply config

Apply deploy.yaml

Result

Related issue number

Checks

Uh oh!

CheyuWu commented Jul 9, 2025

Uh oh!

kevin85421 commented Jul 9, 2025

Uh oh!

kevin85421 commented Jul 9, 2025

Uh oh!

CheyuWu commented Jul 10, 2025

Uh oh!

kevin85421 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CheyuWu commented Jul 10, 2025

Uh oh!

CheyuWu commented Jul 11, 2025

Uh oh!

CheyuWu commented Jul 12, 2025

Uh oh!

kevin85421 commented Jul 12, 2025

Uh oh!

troychiu Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

troychiu Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

troychiu Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

troychiu commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CheyuWu commented Jul 13, 2025

Uh oh!

troychiu Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

CheyuWu Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

troychiu Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

CheyuWu Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

CheyuWu Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

troychiu Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CheyuWu commented Jul 9, 2025 •

edited

Loading

Apply `deploy.yaml`

Apply `deploy.yaml`

kevin85421 left a comment •

edited

Loading

troychiu commented Jul 13, 2025 •

edited

Loading

CheyuWu Jul 14, 2025 •

edited

Loading

CheyuWu Jul 14, 2025 •

edited

Loading