-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Enable sequence parallelism for full cuda graph without specifying compile sizes
#21031
opened Jul 16, 2025 by
cascade812
Loading…
[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group
#21024
opened Jul 16, 2025 by
Kevin-XiongC
Loading…
4 tasks
Add the instruction to run e2e validation manually before release
documentation
Improvements or additions to documentation
#21023
opened Jul 16, 2025 by
huydhn
Loading…
4 tasks done
[XPU] Enable external_launcher to serve as an executor via torchrun
v1
#21021
opened Jul 16, 2025 by
chaojun-zhang
Loading…
4 tasks
[Misc] Minor comment reorganization in capture_model()
v1
#21015
opened Jul 15, 2025 by
ruisearch42
Loading…
3 of 4 tasks
[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#21013
opened Jul 15, 2025 by
mgoin
Loading…
4 tasks
[protocol] Add request_id to the Request object so they can be controlled better via external load balancers
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#21009
opened Jul 15, 2025 by
kouroshHakha
Loading…
4 tasks
[Not for merge] Unshift eagle prefill
documentation
Improvements or additions to documentation
llama
Related to Llama models
needs-rebase
new-model
Requests to new models
speculative-decoding
v1
#21008
opened Jul 15, 2025 by
morgendave
•
Draft
4 tasks
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue
performance
Performance-related issues
v1
#21005
opened Jul 15, 2025 by
JialinOuyang-Meta
Loading…
Start using py3.12 for TPU.
ci/build
documentation
Improvements or additions to documentation
tpu
Related to Google TPUs
#21000
opened Jul 15, 2025 by
vanbasten23
Loading…
3 of 4 tasks
[Misc] unify variable for LLM instance
documentation
Improvements or additions to documentation
llama
Related to Llama models
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#20996
opened Jul 15, 2025 by
andyxning
Loading…
4 tasks
[Performance] EPLB Execution Optimization
v1
#20990
opened Jul 15, 2025 by
david6666666
•
Draft
4 of 9 tasks
[CI] [Doc]: Add GH Action for auto labeling issues with Related to AMD ROCm
rocm
tag
ci/build
rocm
#20988
opened Jul 15, 2025 by
vllmellm
Loading…
3 of 4 tasks
Delete the unused parameter comments in the make method of FusedMoEParallelConfig
#20985
opened Jul 15, 2025 by
zhanghw0354
Loading…
fix(completion): always include usage
frontend
#20983
opened Jul 15, 2025 by
max-wittig
Loading…
3 of 4 tasks
[Kernel] DeepGemm MoE : Integrate cuda moe permute/unpermute
performance
Performance-related issues
#20982
opened Jul 15, 2025 by
varun-sundar-rabindranath
•
Draft
fix: Handle unsupported message fields in tool calling
#20973
opened Jul 15, 2025 by
ejrtks1020
Loading…
add support for qwen3 moe model EPLB
qwen
Related to Qwen models
#20967
opened Jul 15, 2025 by
hsliuustc
Loading…
2 of 4 tasks
Fix tool_calls to fit with openai client
frontend
#20966
opened Jul 15, 2025 by
relic-yuexi
Loading…
3 of 4 tasks
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.