adding pre-request plugin to requestcontrol layer #1004

nirrozenbaum · 2025-06-17T20:38:20Z

This PR adds the PreRequest extension point to requestcontrol layer.
the registered PreRequest plugins will be invoked after a successful result was received from the scheduling layer (that is, a successful SchedulingResult). the extension allows wiring up multi profile results using the request properties (e.g., Headers that will be later added to the request using the generateHeaders helper func).

More specifically, this is the enabler for clean coding of the Prefill/Decode wiring in llm-d.

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

k8s-ci-robot · 2025-06-17T20:38:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nirrozenbaum
Once this PR has been reviewed and has the lgtm label, please assign arangogutierrez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2025-06-17T20:38:56Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`0182be4`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68527ccba532720008b93698
😎 Deploy Preview	https://deploy-preview-1004--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

nirrozenbaum · 2025-06-17T20:39:26Z

cc @kfswain @liu-cong @kfirtoledo

kfswain · 2025-06-17T21:32:28Z

pkg/epp/requestcontrol/director.go


-	endpoint := targetPod.Address + ":" + strconv.Itoa(int(pool.Spec.TargetPortNumber))
+	endpoint := net.JoinHostPort(targetPod.Address, strconv.Itoa(targetPort))


kfswain · 2025-06-17T21:33:38Z

pkg/epp/requestcontrol/director.go

-	}
-
-	return res, nil // TODO handle multi cycle result after defining the PostDispatch extension point
+	return nil
 }

 // PostDispatch populates the RequestContext based on scheduling results.
 func (d *Director) PostDispatch(ctx context.Context, reqCtx *handlers.RequestContext, result *schedulingtypes.SchedulingResult) (*handlers.RequestContext, error) {


We discussed renaming PostDispatch -> PreRequest, this PR seems reasonable to do that.

I've updated the function names including comments.
the use of the term Dispatch is now removed completely.
we discussed it in the past, that this terminology might be confusing for a reader cause the request is not dispatched at this point in the code.

the new function names are:

admitRequest - handles admission control to decide whether or not to accept the request
based on the request criticality and system saturation state.

no more Dispatch function. instead just call Scheduler.Schedule (there was no logic in Dispatch other than that).

prepareRequest - populates the RequestContext and calls the registered PreRequest plugins
for allowing plugging customized logic based on the scheduling results.

pkg/epp/requestcontrol/director.go

kfswain · 2025-06-17T21:36:52Z

Overall LGTM, just some naming changes that I think belong in this PR. Thanks!

liu-cong · 2025-06-17T22:28:04Z

pkg/epp/requestcontrol/plugins.go

+// before a request is sent to the selected model server.
+type PreRequest interface {
+	plugins.Plugin
+	PreRequest(ctx context.Context, request *types.LLMRequest, schedulingResult *types.SchedulingResult, targetPort int)


Should we pass targetPod as an argument instead of schedulingResult. Given the multi-profile scheduler architecture, it's unclear if schedulingResult is the result of one profile or end result.

with the new scheduler design, there are two different results under the scheduling package.
we have:

ProfileRunResult - which represents a single profile run result.

SchedulingResult - which is a map from profile name to it's ProfileRunResult + a field that specifies the primary profile that should be used in the destination header.

to your question - passing SchedulingResult and not the targetPod is intentional. PreRequest extension point is exactly the place where we can make sense of the multi profile results.
For example, in llm-d and PD use case, this is the place where we wire prefill selected endpoint(s) in a dedicated header when returning the decode selection. This is the way we wire P + D selected endpoints.
example can be found here:
https://github.com/llm-d/llm-d-inference-scheduler/blob/0c49737834fc9f2b5213447437ac4815b1d5a98c/pkg/plugins/pre-request/pd_prerequest.go#L33-L37

to summarize, we need to keep SchedulingResult here and not only targetPod. Otherwise, there is no place to make sense of the results of the profiles other than the primary one.
if one wants to get the targetPod, it can be done same as was done here:

gateway-api-inference-extension/pkg/epp/requestcontrol/director.go

Line 179 in 286dfdc

targetPod := result.ProfileResults[result.PrimaryProfileName].TargetPod.GetPod()

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

adding pre-request plugin to requestcontrol layer

4bb3831

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2025

k8s-ci-robot requested review from danehans and liu-cong June 17, 2025 20:38

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 17, 2025

kfswain reviewed Jun 17, 2025

View reviewed changes

liu-cong reviewed Jun 17, 2025

View reviewed changes

nirrozenbaum added 2 commits June 18, 2025 10:53

updated function names and documentation to address code review comments

286dfdc

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

remove unused function arg

0182be4

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 18, 2025

nirrozenbaum mentioned this pull request Jun 18, 2025

[WIP] update scheduler to use SchedulerProfiles and new main - DO NOT MERGE llm-d/llm-d-inference-scheduler#179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding pre-request plugin to requestcontrol layer #1004

adding pre-request plugin to requestcontrol layer #1004

Uh oh!

nirrozenbaum commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

netlify bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

nirrozenbaum commented Jun 17, 2025 •

edited

Loading

Uh oh!

kfswain Jun 17, 2025

Uh oh!

kfswain Jun 17, 2025

Uh oh!

nirrozenbaum Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

kfswain commented Jun 17, 2025

Uh oh!

liu-cong Jun 17, 2025

Uh oh!

nirrozenbaum Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!


		endpoint := targetPod.Address + ":" + strconv.Itoa(int(pool.Spec.TargetPortNumber))
		endpoint := net.JoinHostPort(targetPod.Address, strconv.Itoa(targetPort))

adding pre-request plugin to requestcontrol layer #1004

Are you sure you want to change the base?

adding pre-request plugin to requestcontrol layer #1004

Uh oh!

Conversation

nirrozenbaum commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

netlify bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

nirrozenbaum commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfswain Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kfswain commented Jun 17, 2025

Uh oh!

liu-cong Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

netlify bot commented Jun 17, 2025 •

edited

Loading

nirrozenbaum commented Jun 17, 2025 •

edited

Loading

nirrozenbaum Jun 18, 2025 •

edited

Loading

nirrozenbaum Jun 18, 2025 •

edited

Loading