Open
Description
Hi.
I'm trying to introduce EPP into our MaaS (Model as a Service) system, which can deploy LLM as an inference service for users as they need to any where, any time.
In current design of EPP, every single InferencePool has an EPP instance to pick endpoint, but our platform's goal is to deploy tons of different inference service instances (10,000+), which make the deployment of EPP itself become a really big problem.
I'm trying to use this repo as a core function dependency and refactor EPP:InferencePool to 1:N relationship, mainly reusing Director/Scheduler and plugins for our system. At the same time, I'd like to ask for help about production deployment experience/design of EPP
Thanks.