-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[CI/Build] Fix test failure due to updated model repo
ready
ONLY add when PR is ready to merge/full CI is needed
Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding
frontend
#21374
opened Jul 22, 2025 by
deven-labovitch
Loading…
[Docs] Add Expert Parallelism Initial Documentation
documentation
Improvements or additions to documentation
#21373
opened Jul 22, 2025 by
simon-mo
Loading…
3 of 4 tasks
[Model] add Hunyuan V1 Dense Model support.
new-model
Requests to new models
#21368
opened Jul 22, 2025 by
kzjeef
Loading…
3 of 4 tasks
[V1][CUDA] Full cudagraph support for FlashInfer
rocm
Related to AMD ROCm
v1
#21367
opened Jul 22, 2025 by
fhl2000
Loading…
3 of 4 tasks
fix: return {} for tool arguments when no argument is needed, so that…
frontend
tool-calling
#21365
opened Jul 22, 2025 by
web3-luoxi
Loading…
1 of 4 tasks
[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models.
qwen
Related to Qwen models
#21364
opened Jul 22, 2025 by
sighingnow
Loading…
1 of 4 tasks
[feat] Support EAGLE for Qwen2
new-model
Requests to new models
qwen
Related to Qwen models
speculative-decoding
#21363
opened Jul 22, 2025 by
Ximingwang-09
Loading…
3 of 4 tasks
[Bugfix] FIX hermes tool parser streaming bug when using function call
frontend
tool-calling
#21360
opened Jul 22, 2025 by
LiuLi1998
Loading…
[Bugfix] mm caching isn't tied to prefix caching
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
v1
[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI
ci/build
documentation
Improvements or additions to documentation
performance
Performance-related issues
tpu
Related to Google TPUs
#21355
opened Jul 22, 2025 by
yeqcharlotte
Loading…
4 tasks done
[xpu] disable cudagraph for xpu platform
#21354
opened Jul 22, 2025 by
chaojun-zhang
•
Draft
4 tasks
Decode Tokenized IDs to Strings for Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
hf_processor
in llm.chat()
with model_impl=transformers
multi-modality
#21353
opened Jul 22, 2025 by
ariG23498
Loading…
[Core][Feat] Add max-waiting-queue-length parameter to reject requests when waiting queue is full
frontend
v1
#21352
opened Jul 22, 2025 by
chaunceyjiang
Loading…
3 of 4 tasks
[Core] Minor comments and asserts changes in block pool
v1
#21351
opened Jul 22, 2025 by
Jialin
Loading…
3 of 4 tasks
[AMD][BugFix] Fix omission of wvSplitK kernel due to torch.compile
rocm
Related to AMD ROCm
#21350
opened Jul 22, 2025 by
rasmith
Loading…
[Core] Guided decoding v0 deprecation
ci/build
frontend
structured-output
v1
#21347
opened Jul 22, 2025 by
rzabarazesh
•
Draft
1 of 4 tasks
[CI] Unifying Dockerfiles for ARM and X86 Builds
ci/build
documentation
Improvements or additions to documentation
#21343
opened Jul 22, 2025 by
kebe7jun
Loading…
3 of 4 tasks
Add anthropic endpoint
documentation
Improvements or additions to documentation
frontend
tool-calling
v1
#21341
opened Jul 22, 2025 by
SriRangaTarun
•
Draft
[TPU][Bugfix] fix moe layer
tpu
Related to Google TPUs
v1
#21340
opened Jul 22, 2025 by
yaochengji
Loading…
Support DeepSeekV3-style block FP8 quantization with CT
deepseek
Related to DeepSeek models
#21337
opened Jul 21, 2025 by
mgoin
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.