vllm-project / vllm Public

Notifications
Fork 8.8k
Star 52.9k

Code
Issues 1.8k
Pull requests 848
Discussions
Actions
Projects 11
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 56 Milestones 1

New pull request New

848 Open 10,205 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[CI/Build] Fix test failure due to updated model repo ready

ONLY add when PR is ready to merge/full CI is needed

#21375 opened Jul 22, 2025 by DarkLight1337

Loading…

1 of 4 tasks

v0.10.0

Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding frontend

#21374 opened Jul 22, 2025 by deven-labovitch

Loading…

[Docs] Add Expert Parallelism Initial Documentation documentation

Improvements or additions to documentation

#21373 opened Jul 22, 2025 by simon-mo

Loading…

3 of 4 tasks

[Quantization] Enable BNB support for more MoE models

#21370 opened Jul 22, 2025 by jeejeelee • Draft

4 tasks

[Model] add Hunyuan V1 Dense Model support. new-model

Requests to new models

#21368 opened Jul 22, 2025 by kzjeef

Loading…

3 of 4 tasks

[V1][CUDA] Full cudagraph support for FlashInfer rocm

Related to AMD ROCm

#21367 opened Jul 22, 2025 by fhl2000

Loading…

3 of 4 tasks

Refactor pa rocm rocm

Related to AMD ROCm

#21366 opened Jul 22, 2025 by vllmellm • Draft

fix: return {} for tool arguments when no argument is needed, so that… frontend tool-calling

#21365 opened Jul 22, 2025 by web3-luoxi

Loading…

1 of 4 tasks

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. qwen

Related to Qwen models

#21364 opened Jul 22, 2025 by sighingnow

Loading…

1 of 4 tasks

[feat] Support EAGLE for Qwen2 new-model

Requests to new models

qwen

Related to Qwen models

speculative-decoding

#21363 opened Jul 22, 2025 by Ximingwang-09

Loading…

3 of 4 tasks

[Bugfix] FIX hermes tool parser streaming bug when using function call frontend tool-calling

#21360 opened Jul 22, 2025 by LiuLi1998

Loading…

[Bugfix] mm caching isn't tied to prefix caching documentation

Improvements or additions to documentation

multi-modality

Related to multi-modality (#4194)

ready

ONLY add when PR is ready to merge/full CI is needed

#21358 opened Jul 22, 2025 by zucchini-nlp

Loading…

v0.10.0

[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI ci/build documentation

Improvements or additions to documentation

performance

Performance-related issues

tpu

Related to Google TPUs

#21355 opened Jul 22, 2025 by yeqcharlotte

Loading…

4 tasks done

[xpu] disable cudagraph for xpu platform

#21354 opened Jul 22, 2025 by chaojun-zhang • Draft

4 tasks

Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers multi-modality

Related to multi-modality (#4194)

ready

ONLY add when PR is ready to merge/full CI is needed

#21353 opened Jul 22, 2025 by ariG23498

Loading…

[Core][Feat] Add max-waiting-queue-length parameter to reject requests when waiting queue is full frontend v1

#21352 opened Jul 22, 2025 by chaunceyjiang

Loading…

3 of 4 tasks

[Core] Minor comments and asserts changes in block pool v1

#21351 opened Jul 22, 2025 by Jialin

Loading…

3 of 4 tasks

[AMD][BugFix] Fix omission of wvSplitK kernel due to torch.compile rocm

Related to AMD ROCm

#21350 opened Jul 22, 2025 by rasmith

Loading…

[Core] Guided decoding v0 deprecation ci/build frontend structured-output v1

#21347 opened Jul 22, 2025 by rzabarazesh • Draft

1 of 4 tasks

[Speculative Decoding] Add speculators Config Support

#21345 opened Jul 22, 2025 by dsikka • Draft

[CI] Unifying Dockerfiles for ARM and X86 Builds ci/build documentation

Improvements or additions to documentation

#21343 opened Jul 22, 2025 by kebe7jun

Loading…

3 of 4 tasks

[V1] port xformers backend to v1 v1

#21342 opened Jul 22, 2025 by TheEpicDolphin • Draft

Add anthropic endpoint documentation

Improvements or additions to documentation

frontend tool-calling v1

#21341 opened Jul 22, 2025 by SriRangaTarun • Draft

[TPU][Bugfix] fix moe layer tpu

Related to Google TPUs

#21340 opened Jul 22, 2025 by yaochengji

Loading…

Support DeepSeekV3-style block FP8 quantization with CT deepseek

Related to DeepSeek models

#21337 opened Jul 21, 2025 by mgoin

Loading…

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!