Deployment design for EPP in production

Hi.

I'm trying to introduce EPP into our MaaS (Model as a Service) system, which can deploy LLM as an inference service for users as they need to any where, any time. 

In current design of EPP, every single InferencePool has an EPP instance to pick endpoint, but our platform's goal is to deploy tons of different inference service instances (10,000+), which make the deployment of EPP itself become a really big problem.

I'm trying to use this repo as a core function dependency and refactor EPP:InferencePool to 1:N relationship, mainly reusing Director/Scheduler and plugins for our system. At the same time, I'd like to ask for help about production deployment experience/design of EPP

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deployment design for EPP in production #1028

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deployment design for EPP in production #1028

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions