Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[CI/Build] Fix test failure due to updated model repo ready ONLY add when PR is ready to merge/full CI is needed
#21375 opened Jul 22, 2025 by DarkLight1337 Loading…
1 of 4 tasks
v0.10.0
[Docs] Add Expert Parallelism Initial Documentation documentation Improvements or additions to documentation
#21373 opened Jul 22, 2025 by simon-mo Loading…
3 of 4 tasks
[Model] add Hunyuan V1 Dense Model support. new-model Requests to new models
#21368 opened Jul 22, 2025 by kzjeef Loading…
3 of 4 tasks
[V1][CUDA] Full cudagraph support for FlashInfer rocm Related to AMD ROCm v1
#21367 opened Jul 22, 2025 by fhl2000 Loading…
3 of 4 tasks
Refactor pa rocm rocm Related to AMD ROCm v1
#21366 opened Jul 22, 2025 by vllmellm Draft
[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. qwen Related to Qwen models
#21364 opened Jul 22, 2025 by sighingnow Loading…
1 of 4 tasks
[feat] Support EAGLE for Qwen2 new-model Requests to new models qwen Related to Qwen models speculative-decoding
#21363 opened Jul 22, 2025 by Ximingwang-09 Loading…
3 of 4 tasks
[Bugfix] mm caching isn't tied to prefix caching documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1
#21358 opened Jul 22, 2025 by zucchini-nlp Loading… v0.10.0
[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI ci/build documentation Improvements or additions to documentation performance Performance-related issues tpu Related to Google TPUs
#21355 opened Jul 22, 2025 by yeqcharlotte Loading…
4 tasks done
[xpu] disable cudagraph for xpu platform
#21354 opened Jul 22, 2025 by chaojun-zhang Draft
4 tasks
Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed
#21353 opened Jul 22, 2025 by ariG23498 Loading…
[Core] Minor comments and asserts changes in block pool v1
#21351 opened Jul 22, 2025 by Jialin Loading…
3 of 4 tasks
[AMD][BugFix] Fix omission of wvSplitK kernel due to torch.compile rocm Related to AMD ROCm
#21350 opened Jul 22, 2025 by rasmith Loading…
[CI] Unifying Dockerfiles for ARM and X86 Builds ci/build documentation Improvements or additions to documentation
#21343 opened Jul 22, 2025 by kebe7jun Loading…
3 of 4 tasks
Add anthropic endpoint documentation Improvements or additions to documentation frontend tool-calling v1
#21341 opened Jul 22, 2025 by SriRangaTarun Draft
[TPU][Bugfix] fix moe layer tpu Related to Google TPUs v1
#21340 opened Jul 22, 2025 by yaochengji Loading…
Support DeepSeekV3-style block FP8 quantization with CT deepseek Related to DeepSeek models
#21337 opened Jul 21, 2025 by mgoin Loading…
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.