Initial proposal for InferenceSchedulingObjective #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kfswain wants to merge 1 commit into kubernetes-sigs:main from kfswain:iso-proposal

+257 −0

Collaborator

kfswain commented Jun 17, 2025

Still some questions to hammer out. but mostly a lift and shift from: https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0 with some corrections (Distinct objectives for the same target model is possible in the current API). Added open questions


          Initial proposal for InferenceSchedulingObjective

ba8dbf3

k8s-ci-robot added the cncf-cla: yes label

netlify bot commented Jun 17, 2025 •

edited

Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`ba8dbf3`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6851f1adc569ab00083915e1
😎 Deploy Preview	https://deploy-preview-1007--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot requested review from liu-cong and robscott

June 17, 2025 22:52

Contributor

k8s-ci-robot commented Jun 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added approved size/L labels

nirrozenbaum reviewed

View reviewed changes

docs/proposals/1001-inference-scheduling-objective/README.md

		@@ -0,0 +1,257 @@
		# Inference Scheduling Objective

Contributor

nirrozenbaum Jun 18, 2025

rename dir to 1007?

nirrozenbaum reviewed

View reviewed changes

docs/proposals/1001-inference-scheduling-objective/README.md

+              ```yaml
+              kind: InferenceModel
+              metadata:
+                name: llama4

Contributor

nirrozenbaum Jun 18, 2025

typo in the name, using same name for two inf models.

nirrozenbaum reviewed

View reviewed changes

docs/proposals/1001-inference-scheduling-objective/README.md

+              // HTTPMatch is an http matching rule. The rules are ANDed.
+              type HTTPMatch struct {
+                // ModelName matches against the model name in the body as per OpenAI protocol
+                ModelName *string

Contributor

nirrozenbaum Jun 18, 2025 •

edited

Loading

if one wants to specify multiple models all with “app: prod”, he needs to define multiple HttpMatch entries, and repeat the same headers in all.
does it make sense to make this field an array of model names? (with a comment that list of models is Or’ed)

nirrozenbaum reviewed

View reviewed changes

docs/proposals/1001-inference-scheduling-objective/README.md

+                // ModelName matches against the model name in the body as per OpenAI protocol
+                ModelName *string
+                // Headers specifies HTTP request header matchers.
+                Headers []HTTPHeaderMatch  // mostly as defined in the gateway api

Contributor

nirrozenbaum Jun 18, 2025

I’d be happy to see the differences between HttpHeaderMatch and GRPcHeaderMatch and see if we can consolidate to one.
personally I don’t like having two optional fields and documenting only one is allowed, although sometimes it is inevitable.

Contributor

ahg-g Jun 18, 2025

I think we should start with http only and drop grpc for now. ext-proc only supports http anyways right now.

nirrozenbaum reviewed

View reviewed changes

docs/proposals/1001-inference-scheduling-objective/README.md


		#### After

		Offload to httpRoute, EPP is now extended to override model name on: `X-Gateway-Model-Name`, added benefit of splitting on pools at the same place.

Contributor

nirrozenbaum Jun 18, 2025

do we require this header always? is it mandatory?
Abdullah mentioned we might want to use bbr to achieve this, but as I stated in the past (while you were out on vacation), this should be verified in terms of performance. I have some concerns about making bbr a mandatory component, as it includes additional ext proc, adding another hop where we parse the request body. it may be expensive, especially in requests with big body (e.g., in a long and continuous chat).

the alternative is to require the sender to add the header, but then the sender is required to specify modelName both in header and body and this is not required by OpenAI schema.

none of the two options is good IMO. feels to me like we need additional thinking around this point in order to make it right.

Contributor

ahg-g Jun 18, 2025

BBR is not a requirement for EPP. It is a requirement if one wants to do traffic splitting or model name redirection. The idea here is that EPP is now not concerned with those features per se and they are mostly pushed up to httproute and BBR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cncf-cla: yes size/L