In this notebook, we will go through the basics of using the SDK to:
 - Spin up a Ray cluster with our desired resources
 - View the status and specs of our Ray cluster
 - Take down the Ray cluster when finished

In [1]:
# Import pieces from codeflare-sdk
from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication

In [2]:
# Create authentication object for user permissions
# IF unused, SDK will automatically check for default kubeconfig, then in-cluster config
# KubeConfigFileAuthentication can also be used to specify kubeconfig path manually
auth = TokenAuthentication(
    token = "sha256~x3YZsYZc8acBGGWakgblin-fnkoEfaFugS5LFEq8DXo",
    server = "https://api.demo-01-rhsys.wzhlab.top:6443",
    skip_tls=True
)
auth.login()



'Logged into https://api.demo-01-rhsys.wzhlab.top:6443'

Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding RayCluster).

NOTE: The default images used by the CodeFlare SDK for creating a RayCluster resource depend on the installed Python version:

- For Python 3.9: 'quay.io/modh/ray:2.35.0-py39-cu121'
- For Python 3.11: 'quay.io/modh/ray:2.35.0-py311-cu121'

If you prefer to use a custom Ray image that better suits your needs, you can specify it in the image field to override the default.

In [3]:
# Create and configure our cluster object
# The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
cluster = Cluster(ClusterConfiguration(
    name='raytest', 
    head_cpu_requests='500m',
    head_cpu_limits='500m',
    head_memory_requests=2,
    head_memory_limits=2,
    head_extended_resource_requests={'nvidia.com/gpu':0}, # For GPU enabled workloads set the head_extended_resource_requests and worker_extended_resource_requests
    worker_extended_resource_requests={'nvidia.com/gpu':0},
    num_workers=2,
    worker_cpu_requests='250m',
    worker_cpu_limits=1,
    worker_memory_requests=4,
    worker_memory_limits=4,
    # image="", # Optional Field 
    write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources 
    # local_queue="local-queue-name" # Specify the local queue manually
))

Yaml resources loaded for raytest


VBox(children=(HBox(children=(Button(description='Cluster Up', icon='play', style=ButtonStyle(), tooltip='Crea…

Output()

Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster.

In [4]:
# Bring up the cluster
cluster.up()

Ray Cluster: 'raytest' has successfully been created


Now, we want to check on the status of our resource cluster, and wait until it is finally ready for use.

In [5]:
cluster.status()

(<CodeFlareClusterStatus.FAILED: 5>, False)

In [6]:
cluster.wait_ready()

Waiting for requested resources to be set up...
Requested cluster is up and running!
Dashboard is ready!


In [8]:
cluster.status()

(<CodeFlareClusterStatus.READY: 1>, True)

Let's quickly verify that the specs of the cluster are as expected.

In [9]:
cluster.details()

RayCluster(name='raytest', status=<RayClusterStatus.READY: 'ready'>, head_cpu_requests='500m', head_cpu_limits='500m', head_mem_requests='2G', head_mem_limits='2G', num_workers=2, worker_mem_requests='4G', worker_mem_limits='4G', worker_cpu_requests='250m', worker_cpu_limits=1, namespace='rhods-notebooks', dashboard='https://ray-dashboard-raytest-rhods-notebooks.apps.demo-01-rhsys.wzhlab.top', worker_extended_resources={'nvidia.com/gpu': 0}, head_extended_resources={'nvidia.com/gpu': 0})

Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up.

In [10]:
cluster.down()

Ray Cluster: 'raytest' has successfully been deleted


In [11]:
auth.logout()

'Successfully logged out of https://api.demo-01-rhsys.wzhlab.top:6443'