Skip to content

Add LoadBalancingPolicy field in InferencePool #404

@Kuromesi

Description

@Kuromesi

What would you like to be added:
As discussed in #169, I think we should provide LoadBalancingPolicy in InferencePool spec so that users can configure different policy and choose the one with better performance in their environment.

Maybe the schema like:

type LoadBalancingPolicy string

const (
	DefaultLoadBalancingPolicy LoadBalancingPolicy = "default"
)

type LoadBalancing struct {
	// Policy specifies the load balancing policy to use when routing requests to the endpoints of the pool.
	//
	// +kubebuilder:validation:Optional
	// +kubebuilder:default="default"
	Policy LoadBalancingPolicy `json:"policy,omitempty"`

	// Maybe provide some customization about the filter?
	// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
	//
	// +kubebuilder:validation:Optional
	QueueThresholdCritical int `json:"queueThresholdCritical,omitempty"`

	// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
	//
	// +kubebuilder:validation:Optional
	QueueingThresholdLoRA int `json:"queueingThresholdLoRA,omitempty"`
}

// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
	// Selector defines a map of labels to watch model server pods
	// that should be included in the InferencePool.
	// In some cases, implementations may translate this field to a Service selector, so this matches the simple
	// map used for Service selectors instead of the full Kubernetes LabelSelector type.
	//
	// +kubebuilder:validation:Required
	Selector map[LabelKey]LabelValue `json:"selector"`

	// TargetPortNumber defines the port number to access the selected model servers.
	// The number must be in the range 1 to 65535.
	//
	// +kubebuilder:validation:Minimum=1
	// +kubebuilder:validation:Maximum=65535
	// +kubebuilder:validation:Required
	TargetPortNumber int32 `json:"targetPortNumber"`

	// LoadBalancing provider load balancing options.
	//
	// +kubebuilder:validation:Optional
	LoadBalancing LoadBalancing `json:"loadBalancingPolicy,omitempty"`

	// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint
	// picker service that picks endpoints for the requests routed to this pool.
	EndpointPickerConfig `json:",inline"`
}

Why is this needed:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions