Skip to content

Deployment design for EPP in production #1028

Open
@fdingiit

Description

@fdingiit

Hi.

I'm trying to introduce EPP into our MaaS (Model as a Service) system, which can deploy LLM as an inference service for users as they need to any where, any time.

In current design of EPP, every single InferencePool has an EPP instance to pick endpoint, but our platform's goal is to deploy tons of different inference service instances (10,000+), which make the deployment of EPP itself become a really big problem.

I'm trying to use this repo as a core function dependency and refactor EPP:InferencePool to 1:N relationship, mainly reusing Director/Scheduler and plugins for our system. At the same time, I'd like to ask for help about production deployment experience/design of EPP

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions