Skip to content

adding pre-request plugin to requestcontrol layer #1004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nirrozenbaum
Copy link
Contributor

This PR adds the PreRequest extension point to requestcontrol layer.
the registered PreRequest plugins will be invoked after a successful result was received from the scheduling layer (that is, a successful SchedulingResult). the extension allows wiring up multi profile results using the request properties (e.g., Headers that will be later added to the request using the generateHeaders helper func).

More specifically, this is the enabler for clean coding of the Prefill/Decode wiring in llm-d.

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nirrozenbaum
Once this PR has been reviewed and has the lgtm label, please assign arangogutierrez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 17, 2025
Copy link

netlify bot commented Jun 17, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 0182be4
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68527ccba532720008b93698
😎 Deploy Preview https://deploy-preview-1004--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@nirrozenbaum
Copy link
Contributor Author

nirrozenbaum commented Jun 17, 2025

cc @kfswain @liu-cong @kfirtoledo


endpoint := targetPod.Address + ":" + strconv.Itoa(int(pool.Spec.TargetPortNumber))
endpoint := net.JoinHostPort(targetPod.Address, strconv.Itoa(targetPort))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

return res, nil // TODO handle multi cycle result after defining the PostDispatch extension point
return nil
}

// PostDispatch populates the RequestContext based on scheduling results.
func (d *Director) PostDispatch(ctx context.Context, reqCtx *handlers.RequestContext, result *schedulingtypes.SchedulingResult) (*handlers.RequestContext, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed renaming PostDispatch -> PreRequest, this PR seems reasonable to do that.

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the function names including comments.
the use of the term Dispatch is now removed completely.
we discussed it in the past, that this terminology might be confusing for a reader cause the request is not dispatched at this point in the code.

the new function names are:

  • admitRequest - handles admission control to decide whether or not to accept the request
    based on the request criticality and system saturation state.
  • no more Dispatch function. instead just call Scheduler.Schedule (there was no logic in Dispatch other than that).
  • prepareRequest - populates the RequestContext and calls the registered PreRequest plugins
    for allowing plugging customized logic based on the scheduling results.

@kfswain
Copy link
Collaborator

kfswain commented Jun 17, 2025

Overall LGTM, just some naming changes that I think belong in this PR. Thanks!

// before a request is sent to the selected model server.
type PreRequest interface {
plugins.Plugin
PreRequest(ctx context.Context, request *types.LLMRequest, schedulingResult *types.SchedulingResult, targetPort int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass targetPod as an argument instead of schedulingResult. Given the multi-profile scheduler architecture, it's unclear if schedulingResult is the result of one profile or end result.

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new scheduler design, there are two different results under the scheduling package.
we have:

  • ProfileRunResult - which represents a single profile run result.
  • SchedulingResult - which is a map from profile name to it's ProfileRunResult + a field that specifies the primary profile that should be used in the destination header.

to your question - passing SchedulingResult and not the targetPod is intentional. PreRequest extension point is exactly the place where we can make sense of the multi profile results.
For example, in llm-d and PD use case, this is the place where we wire prefill selected endpoint(s) in a dedicated header when returning the decode selection. This is the way we wire P + D selected endpoints.
example can be found here:
https://github.com/llm-d/llm-d-inference-scheduler/blob/0c49737834fc9f2b5213447437ac4815b1d5a98c/pkg/plugins/pre-request/pd_prerequest.go#L33-L37

to summarize, we need to keep SchedulingResult here and not only targetPod. Otherwise, there is no place to make sense of the results of the profiles other than the primary one.
if one wants to get the targetPod, it can be done same as was done here:

targetPod := result.ProfileResults[result.PrimaryProfileName].TargetPod.GetPod()

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants