-
Notifications
You must be signed in to change notification settings - Fork 102
adding pre-request plugin to requestcontrol layer #1004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
adding pre-request plugin to requestcontrol layer #1004
Conversation
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirrozenbaum The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
||
endpoint := targetPod.Address + ":" + strconv.Itoa(int(pool.Spec.TargetPortNumber)) | ||
endpoint := net.JoinHostPort(targetPod.Address, strconv.Itoa(targetPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/epp/requestcontrol/director.go
Outdated
} | ||
|
||
return res, nil // TODO handle multi cycle result after defining the PostDispatch extension point | ||
return nil | ||
} | ||
|
||
// PostDispatch populates the RequestContext based on scheduling results. | ||
func (d *Director) PostDispatch(ctx context.Context, reqCtx *handlers.RequestContext, result *schedulingtypes.SchedulingResult) (*handlers.RequestContext, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed renaming PostDispatch
-> PreRequest
, this PR seems reasonable to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the function names including comments.
the use of the term Dispatch
is now removed completely.
we discussed it in the past, that this terminology might be confusing for a reader cause the request is not dispatched at this point in the code.
the new function names are:
admitRequest
- handles admission control to decide whether or not to accept the request
based on the request criticality and system saturation state.- no more
Dispatch
function. instead just callScheduler.Schedule
(there was no logic in Dispatch other than that). prepareRequest
- populates the RequestContext and calls the registered PreRequest plugins
for allowing plugging customized logic based on the scheduling results.
Overall LGTM, just some naming changes that I think belong in this PR. Thanks! |
// before a request is sent to the selected model server. | ||
type PreRequest interface { | ||
plugins.Plugin | ||
PreRequest(ctx context.Context, request *types.LLMRequest, schedulingResult *types.SchedulingResult, targetPort int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass targetPod as an argument instead of schedulingResult
. Given the multi-profile scheduler architecture, it's unclear if schedulingResult
is the result of one profile or end result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the new scheduler design, there are two different results under the scheduling package.
we have:
ProfileRunResult
- which represents a single profile run result.SchedulingResult
- which is a map from profile name to it'sProfileRunResult
+ a field that specifies the primary profile that should be used in the destination header.
to your question - passing SchedulingResult
and not the targetPod is intentional. PreRequest
extension point is exactly the place where we can make sense of the multi profile results.
For example, in llm-d and PD use case, this is the place where we wire prefill selected endpoint(s) in a dedicated header when returning the decode selection. This is the way we wire P + D selected endpoints.
example can be found here:
https://github.com/llm-d/llm-d-inference-scheduler/blob/0c49737834fc9f2b5213447437ac4815b1d5a98c/pkg/plugins/pre-request/pd_prerequest.go#L33-L37
to summarize, we need to keep SchedulingResult
here and not only targetPod
. Otherwise, there is no place to make sense of the results of the profiles other than the primary one.
if one wants to get the targetPod
, it can be done same as was done here:
targetPod := result.ProfileResults[result.PrimaryProfileName].TargetPod.GetPod() |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
This PR adds the
PreRequest
extension point to requestcontrol layer.the registered
PreRequest
plugins will be invoked after a successful result was received from the scheduling layer (that is, a successful SchedulingResult). the extension allows wiring up multi profile results using the request properties (e.g., Headers that will be later added to the request using thegenerateHeaders
helper func).More specifically, this is the enabler for clean coding of the Prefill/Decode wiring in llm-d.