Skip to content

[Serve] Add RouterConfig field to DeploymentConfig to configure RequestRouter #53870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 10, 2025

Conversation

eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Jun 17, 2025

Why are these changes needed?

Follow ups to #52725.

  • Allow custom router kwargs in LLMConfig/DeploymentConfig

Design doc. We also discussed passing kwargs directly to the derived class' __init__, but concerned that this may lead to typos getting swallowed by kwargs in RequestRouter.__init__. Instead, initialize_state without kwargs can throw a TypeError, e.g.:

class RequestRouter:

	def __init__(self, ..., request_initizer_config: dict)

		self.initialize_state(**request_initizer_config)

	def initialize_state(self, **kwargs):
		pass


class MyRouter(RequestRouter):

	def initialize_state(self, threshold: float = 0.1):
		...

	def choose_replica(self, ...):
		...

# Result is TypeError: initialize_state() got an unexpected keyword argument 'thresh'
@serve.deployment(cls=MyRouter, cls_kwargs={"thresh": 0.2})
class Deploy:
	pass

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 18, 2025
@eicherseiji
Copy link
Contributor Author

eicherseiji commented Jun 18, 2025

kwargs can be passed to a custom router class like so:

from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
from ray.serve._private.request_router.prefix_aware_router import PrefixAwarePow2ReplicaRouter

llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="deepseek",
        model_source="qwen/Qwen2.5-7B-Instruct",
    ),
    runtime_env=dict(
        env_vars={"VLLM_USE_V1": "1"}
    ),
    deployment_config=dict(
        autoscaling_config=dict(min_replicas=1, max_replicas=1),
        request_router_class=PrefixAwarePow2ReplicaRouter,
        request_router_kwargs=dict(
            imbalanced_threshold=9,
        )
    ),
    engine_kwargs=dict(
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        gpu_memory_utilization=0.92,
        dtype="auto",
        max_num_seqs=40,
        max_model_len=16384,
        enable_chunked_prefill=True,
        enable_prefix_caching=True,
        trust_remote_code=True,
    ),
)

app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)

@eicherseiji eicherseiji self-assigned this Jun 18, 2025
@eicherseiji eicherseiji marked this pull request as ready for review June 18, 2025 18:48
@Copilot Copilot AI review requested due to automatic review settings June 18, 2025 18:48
@eicherseiji eicherseiji requested review from a team as code owners June 18, 2025 18:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables passing custom keyword arguments (request_router_kwargs) through the Serve configuration to user‐provided request router classes. Additionally, it refactors the prefix‐aware router’s eviction loop from asyncio tasks to a background thread and updates related tests.

  • Add request_router_kwargs field in protobuf, config model, deployment API, and router
  • Wire serialization/deserialization of request_router_kwargs
  • Refactor eviction loop in prefix_tree.py from asyncio to threading
  • Update tests to call the router’s private selection method

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/ray/protobuf/serve.proto Add new bytes request_router_kwargs field to DeploymentConfig
python/ray/serve/deployment.py Expose request_router_kwargs in options() and apply to config
python/ray/serve/_private/router.py Forward request_router_kwargs to router constructor and store it
python/ray/serve/_private/config.py Define request_router_kwargs, validate JSON, and handle proto I/O
python/ray/llm/tests/serve/cpu/deployments/test_prefix_aware_request_router.py Replace public call with private _choose_replica_for_request
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py Convert eviction loop from asyncio to a background threading.Thread
Comments suppressed due to low confidence (2)

python/ray/serve/deployment.py:241

  • [nitpick] Inserting a new parameter into the middle of the options() signature can break callers using positional arguments. Consider making it keyword-only or placing it at the end with a default.
        request_router_kwargs: Default[Union[Dict, None]] = DEFAULT.VALUE,

python/ray/llm/tests/serve/cpu/deployments/test_prefix_aware_request_router.py:127

  • [nitpick] The test now calls a private method (_choose_replica_for_request) instead of the public API (choose_replica_for_request). It's better to test via the public interface to avoid coupling to internal implementation.
            chosen = await prefix_request_router._choose_replica_for_request(req)

@eicherseiji
Copy link
Contributor Author

Hi @kouroshHakha! This is ready for your review.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  1. Let's separate out serve only changes from the eviction thread changes and review the serve changes with serve team
  2. Let's talk about the request router kwargs. The original intention of the design was to not expose the complexity of the constructor of the RequestRouter to the user. Right now the request_router_kwargs are passed through to the constructor which inflates the other kwargs that were supposed to stay hidden. Here is my proposal:

Modify the RequestRouter's constructor and interface this way:

class RequestRouter: 
        def __init__(self, ..., custom_init_kwargs=...)
              ...
              self.init(**custom_init_kwargs)

         def init(**kwargs): 
              # custom initialization for the Request Router. Called after the base constructor __init__ is done.

This way when I inherit this class I can simply do:

class MyRouter(RequestRouter)
        
        def init(self, param1=None)
               self.param1 = param1

        def choose_replica(...): 
              # create a policy based on self.param1


@serve.deployment(request_router_class=MyRouter, request_router_init_kwargs={"param1": 10})
class MyDeployment:
    ....

@eicherseiji eicherseiji requested a review from a team as a code owner June 30, 2025 19:05
@eicherseiji
Copy link
Contributor Author

Implementation as aligned in design doc

@eicherseiji eicherseiji force-pushed the prefix-router branch 2 times, most recently from 3cde81f to 41676a8 Compare July 1, 2025 22:27
@eicherseiji eicherseiji changed the title Pass parameters to custom routers through LLMConfig Add RouterConfig field to DeploymentConfig to configure RequestRouter Jul 1, 2025
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formatting issues and style nits.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. I left some comments with a 5 min review with more focus around the llm specific changes. I think @abrarsheikh / @zcin should take a closer look at the serve changes.

@eicherseiji eicherseiji requested a review from zcin July 3, 2025 22:01
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg2m

@eicherseiji eicherseiji requested a review from kouroshHakha July 7, 2025 18:49
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comment are left. Most important one being the stability of the API is still Alpha, so don't mark it as stable.

eicherseiji and others added 24 commits July 9, 2025 14:06
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…s, make it private

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@kouroshHakha kouroshHakha changed the title Add RouterConfig field to DeploymentConfig to configure RequestRouter [Serve] Add RouterConfig field to DeploymentConfig to configure RequestRouter Jul 9, 2025
@kouroshHakha kouroshHakha merged commit 5322950 into ray-project:master Jul 10, 2025
5 checks passed
Sparks0219 pushed a commit to Sparks0219/ray that referenced this pull request Jul 14, 2025
…stRouter (ray-project#53870)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: joshlee <joshlee@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants