Open
Description
What would you like to be added:
As discussed in #169, I think we should provide LoadBalancingPolicy in InferencePool spec so that users can configure different policy and choose the one with better performance in their environment.
Maybe the schema like:
type LoadBalancingPolicy string
const (
DefaultLoadBalancingPolicy LoadBalancingPolicy = "default"
)
type LoadBalancing struct {
// Policy specifies the load balancing policy to use when routing requests to the endpoints of the pool.
//
// +kubebuilder:validation:Optional
// +kubebuilder:default="default"
Policy LoadBalancingPolicy `json:"policy,omitempty"`
// Maybe provide some customization about the filter?
// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
//
// +kubebuilder:validation:Optional
QueueThresholdCritical int `json:"queueThresholdCritical,omitempty"`
// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
//
// +kubebuilder:validation:Optional
QueueingThresholdLoRA int `json:"queueingThresholdLoRA,omitempty"`
}
// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
// Selector defines a map of labels to watch model server pods
// that should be included in the InferencePool.
// In some cases, implementations may translate this field to a Service selector, so this matches the simple
// map used for Service selectors instead of the full Kubernetes LabelSelector type.
//
// +kubebuilder:validation:Required
Selector map[LabelKey]LabelValue `json:"selector"`
// TargetPortNumber defines the port number to access the selected model servers.
// The number must be in the range 1 to 65535.
//
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:validation:Required
TargetPortNumber int32 `json:"targetPortNumber"`
// LoadBalancing provider load balancing options.
//
// +kubebuilder:validation:Optional
LoadBalancing LoadBalancing `json:"loadBalancingPolicy,omitempty"`
// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint
// picker service that picks endpoints for the requests routed to this pool.
EndpointPickerConfig `json:",inline"`
}
Why is this needed:
Metadata
Metadata
Assignees
Labels
No labels