-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
🚀 The feature, motivation and pitch
The current offline_inference.py
functionality is relatively basic, and developers often need to manually modify the script to meet their specific needs. For example, tasks such as loading models from local paths, running multi-GPU inference, applying specific quantization algorithms, configure LLM engine parameters, or customizing inputs (e.g., batch size, input length, output length, sampling parameters, etc.) require manual intervention.
We can enhance the functionality of offline_inference.py
to support these features while keeping the default behavior unchanged. By introducing configurable parameters, users will be able to adapt the script to their requirements without having to manually modify the code. This enhancement would improve usability and flexibility for a wider range of scenarios.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.