Skip to content

[Feature]: Enhance offline_inference.py with Configurable Parameters for Greater Flexibility #10391

@wchen61

Description

@wchen61

🚀 The feature, motivation and pitch

The current offline_inference.py functionality is relatively basic, and developers often need to manually modify the script to meet their specific needs. For example, tasks such as loading models from local paths, running multi-GPU inference, applying specific quantization algorithms, configure LLM engine parameters, or customizing inputs (e.g., batch size, input length, output length, sampling parameters, etc.) require manual intervention.

We can enhance the functionality of offline_inference.py to support these features while keeping the default behavior unchanged. By introducing configurable parameters, users will be able to adapt the script to their requirements without having to manually modify the code. This enhancement would improve usability and flexibility for a wider range of scenarios.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions