-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[XPU] Fix AWQ skipped layer detection in IPEX quantization
ready
ONLY add when PR is ready to merge/full CI is needed
#29774
opened Dec 1, 2025 by
faaany
Loading…
3 of 5 tasks
[ROCm] [Fused Moe] Use binary expert mask for aiter fused moe kernel
rocm
Related to AMD ROCm
#29773
opened Dec 1, 2025 by
ZhiweiYan-96
Loading…
5 tasks
[Draft] AFD basic implention
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
frontend
v1
#29772
opened Dec 1, 2025 by
jiangkuaixue123
•
Draft
11 tasks
[Misc] Throw error on unintended access to scheduler_config.max_model_len
#29771
opened Dec 1, 2025 by
frank-wei
Loading…
5 tasks
[Bugfix][MM] Move Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
grid_thw tensor to cpu before directly converting to numpy
qwen
#29770
opened Dec 1, 2025 by
shen-shanshan
Loading…
5 tasks
Refactor MLA attention: move prefill logic to layer.py
rocm
Related to AMD ROCm
v1
#29769
opened Dec 1, 2025 by
therealnaveenkamal
Loading…
1 of 5 tasks
Bump actions/setup-python from 6.0.0 to 6.1.0
ci/build
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#29768
opened Dec 1, 2025 by
dependabot
bot
Loading…
[Misc] Unify tokenizer registration
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
structured-output
v1
#29767
opened Dec 1, 2025 by
DarkLight1337
Loading…
5 tasks
[Bugfix]: Fix missing SPLIT_K in GPTQ/AWQ MoE Triton config
#29766
opened Dec 1, 2025 by
aaarkai
Loading…
3 of 5 tasks
[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#29764
opened Dec 1, 2025 by
chaunceyjiang
Loading…
5 tasks
Add Mistral Large 3
deepseek
Related to DeepSeek models
new-model
Requests to new models
ready
ONLY add when PR is ready to merge/full CI is needed
speculative-decoding
v1
#29757
opened Nov 30, 2025 by
juliendenize
Loading…
1 of 5 tasks
Fix KV cache sync issue during CUDA graph replay
kv-connector
nvidia
#29755
opened Nov 30, 2025 by
yashwantbezawada
Loading…
docs: add guide to reduce PyTorch profiler overhead via env vars (#29564)
documentation
Improvements or additions to documentation
#29753
opened Nov 30, 2025 by
kbp4154
Loading…
[Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL
qwen
Related to Qwen models
#29752
opened Nov 30, 2025 by
skyloevil
Loading…
[crashfix] Eagle + multimodal can crash on mm cache miss
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#29750
opened Nov 30, 2025 by
mickaelseznec
Loading…
5 tasks
[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions
#29748
opened Nov 30, 2025 by
danielafrimi
Loading…
[FEAT][Performance]Optimize native sampling by removing expensive scatter
v1
#29746
opened Nov 30, 2025 by
Flink-ddd
Loading…
Add KV Cache Memory Estimator Example Script
documentation
Improvements or additions to documentation
#29736
opened Nov 29, 2025 by
ksenthilnathan02
Loading…
[Feature][#29390]: Add timeout support to MultiprocExecutor.collective_rpc and FutureWrapper
v1
#29733
opened Nov 29, 2025 by
SandishKumarHN
Loading…
5 tasks
[Frontend] Add streaming tool-call support to Responses API (non-Harmony)
frontend
gpt-oss
Related to GPT-OSS models
#29726
opened Nov 29, 2025 by
sumitaryal
Loading…
5 tasks
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync
speculative-decoding
v1
#29723
opened Nov 29, 2025 by
dongbo910220
Loading…
5 tasks
[WIP][Feat][Sched] Support Balance Scheduling
v1
#29721
opened Nov 29, 2025 by
GDzhu01
Loading…
5 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.