In this notebook, we will go over how to leverage the SDK to directly work interactively with a Ray Cluster during development.

In [1]:
# Import pieces from codeflare-sdk
from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication

In [2]:
import codeflare_sdk
print(codeflare_sdk.__version__)

0.23.1


In [3]:
# Create authentication object for user permissions
# IF unused, SDK will automatically check for default kubeconfig, then in-cluster config
# KubeConfigFileAuthentication can also be used to specify kubeconfig path manually
auth = TokenAuthentication(
    token = "sha256~v-lxC7Fd_gnWMkVwxDJAVQ8uhZCLJ1kormSDd1JdDIk",
    server = "https://api.demo-01-rhsys.wzhlab.top:6443",
    skip_tls= True
)
auth.login()



'Logged into https://api.demo-01-rhsys.wzhlab.top:6443'

Once again, let's start by running through the same cluster setup as before:

NOTE: The default images used by the CodeFlare SDK for creating a RayCluster resource depend on the installed Python version:

- For Python 3.9: 'quay.io/modh/ray:2.35.0-py39-cu121'
- For Python 3.11: 'quay.io/modh/ray:2.35.0-py311-cu121'

If you prefer to use a custom Ray image that better suits your needs, you can specify it in the image field to override the default.

In [4]:
# Create and configure our cluster object
# The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
cluster_name = "llama-factory-test"
cluster = Cluster(ClusterConfiguration(
    name=cluster_name,
    head_cpu_requests=1,
    head_cpu_limits=1,
    head_memory_requests=6,
    head_memory_limits=6,
    head_extended_resource_requests={'nvidia.com/gpu':0}, # For GPU enabled workloads set the head_extended_resource_requests and worker_extended_resource_requests
    worker_extended_resource_requests={'nvidia.com/gpu':0},
    num_workers=2,
    worker_cpu_requests='2',
    worker_cpu_limits=8,
    worker_memory_requests=4,
    worker_memory_limits=12,
    image="quay.io/wangzheng422/qimgs:llama-factory-ray-20250106-v08", # Optional Field 
    write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources 
    # local_queue="local-queue-name" # Specify the local queue manually
    verify_tls=False,
))

Yaml resources loaded for llama-factory-test


VBox(children=(HBox(children=(Button(description='Cluster Up', icon='play', style=ButtonStyle(), tooltip='Crea…

Output()

In [5]:
# Bring up the cluster
cluster.up()
cluster.wait_ready()

Ray Cluster: 'llama-factory-test' has successfully been created
Waiting for requested resources to be set up...
Requested cluster is up and running!
Dashboard is ready!


In [6]:
cluster.details()

RayCluster(name='llama-factory-test', status=<RayClusterStatus.READY: 'ready'>, head_cpu_requests=1, head_cpu_limits=1, head_mem_requests='6G', head_mem_limits='6G', num_workers=2, worker_mem_requests='4G', worker_mem_limits='12G', worker_cpu_requests='2', worker_cpu_limits=8, namespace='rhods-notebooks', dashboard='https://ray-dashboard-llama-factory-test-rhods-notebooks.apps.demo-01-rhsys.wzhlab.top', worker_extended_resources={'nvidia.com/gpu': 0}, head_extended_resources={'nvidia.com/gpu': 0})

This time we will demonstrate another potential method of use: working with the Ray cluster interactively.

Using the SDK, we can get both the Ray cluster URI and dashboard URI:

In [7]:
ray_dashboard_uri = cluster.cluster_dashboard_uri()
ray_cluster_uri = cluster.cluster_uri()
print(ray_dashboard_uri)
print(ray_cluster_uri)

https://ray-dashboard-llama-factory-test-rhods-notebooks.apps.demo-01-rhsys.wzhlab.top
ray://llama-factory-test-head-svc.rhods-notebooks.svc:10001


Now we can connect directly to our Ray cluster via the Ray python client:

In [8]:
from codeflare_sdk import generate_cert
# Create required TLS cert and export the environment variables to enable TLS
generate_cert.generate_tls_cert(cluster_name, cluster.config.namespace)
generate_cert.export_env(cluster_name, cluster.config.namespace)

In [9]:
# before proceeding make sure the cluster exists and the uri is not empty
assert ray_cluster_uri, "Ray cluster needs to be started and set before proceeding"

import ray

# reset the ray context in case there's already one. 
ray.shutdown()
# establish connection to ray cluster

# install additional libraries that will be required for model training
# runtime_env = {"pip": ["transformers==4.41.2", "datasets==2.17.0", "accelerate==0.31.0", "scikit-learn==1.5.0"]}
runtime_env = {}
# NOTE: This will work for in-cluster notebook servers (RHODS/ODH), but not for local machines
# To see how to connect from your laptop, go to demo-notebooks/additional-demos/local_interactive.ipynb
ray.init(address=ray_cluster_uri, runtime_env=runtime_env, ignore_reinit_error=True)

print("Ray cluster is up and running: ", ray.is_initialized())

2025-01-09 05:52:57,153	INFO client_builder.py:244 -- Passing the following kwargs to ray.init() on the server: ignore_reinit_error
SIGTERM handler is not set because current thread is not the main thread.


Ray cluster is up and running:  True


Now that we are connected (and have passed in some package requirements), let's try writing some training code:

In [10]:
# Initialize the Job Submission Client
client = cluster.job_client

Once we want to test our code out, we can run the training function we defined above remotely on our Ray cluster:

In [11]:
submission_id = client.submit_job(
    entrypoint=f"llamafactory-cli train wzh/tinyllama_lora_sft_ray.yaml",
    runtime_env={
        "env_vars": {
            'USE_RAY': '1'
        },
        # 'pip': 'requirements.txt',
        'working_dir': './',
        "excludes": ["/docs/", "*.ipynb", "*.md"]
    },
)
print(submission_id)

Actor 1 IP: 10.132.0.201
Actor 2 IP: 10.132.0.201
[2025-01-09 05:54:35,087] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[INFO|2025-01-09 05:54:39] llamafactory.cli:157 >> Initializing distributed tasks at: 10.132.0.201:29500
[2025-01-09 05:54:46,845] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2025-01-09 05:54:48,450] [INFO] [comm.py:637:init_distributed] cdb=None
[2025-01-09 05:54:48,450] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend gloo
[1/3] c++ -MMD -MF shm_interface.o.d -DTORCH_EXTENSION_NAME=deepspeed_shm_comm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/cpu/includes -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include -isystem /opt/app-root/lib64/python3.11/site-packag

Once complete, we can bring our Ray cluster down and clean up:

In [None]:
client.stop_job(submission_id)

In [22]:
cluster.down()

Ray Cluster: 'llama-factory-test' has successfully been deleted


In [23]:
auth.logout()

'Successfully logged out of https://api.demo-01-rhsys.wzhlab.top:6443'