[Feature]:  Enabling draft model based speculative decoding for CPUs

### 🚀 The feature, motivation and pitch

### Current Implementation Status
This [branch](https://github.com/zihaoanllm/vllm/tree/model/integrate-pard-0521) is the PARD implementation of Speculative decoding for V0. However, this was done a few months ago and is unsupported with V1.

With V1, speculative decoding with vLLM does not have draft model support. It raises the following error.
```NotImplementedError: Draft model speculative decoding is not supported yet. Please consider using other speculative decoding methods such as ngram, medusa, eagle, or mtp.```

Other speculative decoding methods such as eagle, ngram etc also raise the following assertion when run on CPU.
```AssertionError: spec decode is not supported.``` [PermaLink](https://github.com/vllm-project/vllm/blob/11fd69dd54060a59c6f62a6d217e1ecc47d74a68/vllm/v1/worker/cpu_model_runner.py#L27)

I found a [PR](https://github.com/vllm-project/vllm/pull/24322) that has been created to add draft model support.
But there is no mention of support for CPUs.

### Feature Request
Enabling draft model based speculative decoding for CPUs

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Enabling draft model based speculative decoding for CPUs #28384

🚀 The feature, motivation and pitch

Current Implementation Status

Feature Request

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Enabling draft model based speculative decoding for CPUs #28384

Description

🚀 The feature, motivation and pitch

Current Implementation Status

Feature Request

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions