[Usage]: Async engine batch request

### Your current environment

A100, nvidia 12.1.... just run simple inference async with a standard llm model

### How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Hi i have been using async engine for inference and its nice to handle all of this in a queue, which emulates the requests.
My question is can i handle a batch of requests? Considering its a wrapper around LLM engine i do not see why not
so that each request is engine.async( [prompt1, prompt2, prompt3] )->[gen1,gen2,gen3] instead of engine.async( [prompt1] )->[gen1]. I want to make sure i am able to maintain the queue and not cause issues in the requests received.

Finally one more suggestion....perhaps i can copy and paste the engine.async code with the wrapper and modify the queuing code from there....would that be easier instead?

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Async engine batch request #15314

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Async engine batch request #15314

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions