[Feature]: Add tokenizer process pool to eliminate preprocessing bottleneck

### 🚀 The feature, motivation and pitch

When the server experiences high concurrency with multiple long requests, the preprocessing time (which includes tokenization) becomes a bottleneck. This leads to increased overall latency and poor TTFT performance, as requests are forced to wait in a serialized queue for preprocessing (tokenization), even though the XPU compute resources might be available.

The current implementation in vllm uses a single thread to handle the request (in serving_engine.py). 

```
self._tokenizer_executor = ThreadPoolExecutor(max_workers=1)
```
The cost of these operations scales linearly with the length of the input/output sequences. Under high load with long contexts, a queue of requests forms, each waiting for the previous one to finish its tokenization. 

A multiprocessing-based tokenizer process pool can be utilized to parallelize the encoding and decoding steps, thus breaking the serialization bottleneck. The minor overhead of inter-process communication (IPC) is negligible compared to the massive gains in parallelizing long tokenization tasks.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add tokenizer process pool to eliminate preprocessing bottleneck #25301

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add tokenizer process pool to eliminate preprocessing bottleneck #25301

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions