Refactor the vllm specific code to become model server agnostic #383

ahg-g · 2025-02-21T02:26:37Z

Currently the epp hardcodes the metrics to vllm. To support other model servers, we should make this logic configurable via flags to allow integrations with any model server adhering to the model server protocol as defined in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/docs/proposals/003-endpoint-picker-protocol/README.md

ahg-g · 2025-02-21T02:28:00Z

/assign @BenjaminBraunDev

k8s-ci-robot · 2025-02-21T02:28:03Z

@ahg-g: GitHub didn't allow me to assign the following users: BenjaminBraunDev.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @BenjaminBraunDev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

BenjaminBraunDev · 2025-02-26T18:39:03Z

Commenting to allow assignment!

ahg-g · 2025-02-26T20:02:52Z

/assign @BenjaminBraunDev

BenjaminBraunDev · 2025-03-07T00:48:54Z

Made a PR for this: #461

liu-cong · 2025-03-23T22:28:56Z

/close

k8s-ci-robot · 2025-03-23T22:29:01Z

@liu-cong: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

This was referenced Feb 21, 2025

Tracking issues for release 0.2.0 #362

Closed

Integrate with Triton TensorRT-LLM #170

Closed

Integrate with JetStream model server #361

Closed

k8s-ci-robot assigned BenjaminBraunDev Feb 26, 2025

ahg-g mentioned this issue Mar 1, 2025

scheduling changes for lora affinity load balancing #423

Merged

ahg-g mentioned this issue Mar 14, 2025

v0.3.0 Release Tracker #493

Open

14 tasks

k8s-ci-robot closed this as completed Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the vllm specific code to become model server agnostic #383

Refactor the vllm specific code to become model server agnostic #383

ahg-g commented Feb 21, 2025

ahg-g commented Feb 21, 2025

k8s-ci-robot commented Feb 21, 2025

BenjaminBraunDev commented Feb 26, 2025

ahg-g commented Feb 26, 2025

BenjaminBraunDev commented Mar 7, 2025

liu-cong commented Mar 23, 2025

k8s-ci-robot commented Mar 23, 2025

Refactor the vllm specific code to become model server agnostic #383

Refactor the vllm specific code to become model server agnostic #383

Comments

ahg-g commented Feb 21, 2025

ahg-g commented Feb 21, 2025

k8s-ci-robot commented Feb 21, 2025

BenjaminBraunDev commented Feb 26, 2025

ahg-g commented Feb 26, 2025

BenjaminBraunDev commented Mar 7, 2025

liu-cong commented Mar 23, 2025

k8s-ci-robot commented Mar 23, 2025