[Feature]: Enhance offline_inference.py with Configurable Parameters for Greater Flexibility

### 🚀 The feature, motivation and pitch

The current `offline_inference.py` functionality is relatively basic, and developers often need to manually modify the script to meet their specific needs. For example, tasks such as loading models from local paths, running multi-GPU inference, applying specific quantization algorithms, configure LLM engine parameters, or customizing inputs (e.g., batch size, input length, output length, sampling parameters, etc.) require manual intervention.

We can enhance the functionality of` offline_inference.py` to support these features while keeping the default behavior unchanged. By introducing configurable parameters, users will be able to adapt the script to their requirements without having to manually modify the code. This enhancement would improve usability and flexibility for a wider range of scenarios.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Enhance offline_inference.py with Configurable Parameters for Greater Flexibility #10391

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Enhance offline_inference.py with Configurable Parameters for Greater Flexibility #10391

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions