Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
275 commits
Select commit Hold shift + click to select a range
5637508
[Rocm] [quantization] Fix quark ptpc moe and add test case (#24649)
haoyangli-amd Sep 17, 2025
0f3fc8e
Add more documentation and improve usability of lognormal dist (bench…
pliops-daniels Sep 17, 2025
ce74366
[XPU] Fix xpu model runner call torch.cuda APIs (#25011)
jikunshang Sep 17, 2025
b10a3c3
[EPLB] Support EPLB for Mixtral Model (#22842)
rouchenzi Sep 17, 2025
281e11e
[Core][MultiModalHasher] Hash images without converting image mode (#…
lgeiger Sep 17, 2025
52a69b8
[Model] Pass param prefix to LLMHead (#24862)
whx-sjtu Sep 17, 2025
fa87338
[Model] Apply SharedFusedMoE to glm4_moe. (#24849)
whx-sjtu Sep 17, 2025
332a076
[Core] Remove tokenizer group in vLLM (#24078)
zhuohan123 Sep 17, 2025
55f4643
[Docs] Fix griffe warning in base_static_graph.py (#25018)
windsonsea Sep 17, 2025
c37895f
[DP] Create placement groups by ray_device_key (#25026)
xinyu-intel Sep 17, 2025
f94602f
[Frontend] Support returning all prompt logprobs (#24956)
chaunceyjiang Sep 17, 2025
5938e5f
[BugFix] enable DOTALL to match multi-line tool_call parameters in ex…
shijun-yin Sep 17, 2025
5152935
[Misc] Avoid use of deprecated `AutoModelForVision2Seq` (#25065)
DarkLight1337 Sep 17, 2025
b9578f7
Add RADIO Vision Encoder Support to vLLM (#24595)
danielafrimi Sep 17, 2025
d9c268a
[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check…
bigPYJ1151 Sep 17, 2025
b0b2bc0
Apply fixes for CUDA 13 (#24599)
Aidyn-A Sep 17, 2025
3da495d
[fix] lora benchmarks pass no_lora_flag_cpu (#23774)
dolpm Sep 17, 2025
250ac06
[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP imple…
sighingnow Sep 17, 2025
9b7dd51
[Docs] improve code formatting and comments for eliminate griffe buil…
samzong Sep 17, 2025
e86407a
Remove old cutlass mla (#23961)
MatthewBonanni Sep 17, 2025
2db13d9
[Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970)
samzong Sep 17, 2025
17963ee
[CI Bugfix] Fix failing test_invalid_env (#25078)
mgoin Sep 17, 2025
b25143e
[V0 Deprecation] Remove V0 Core tests (#25082)
WoosukKwon Sep 17, 2025
6b9aa58
cleanup: remove adapter commons (#25045)
simon-mo Sep 17, 2025
e13c47b
Remove unused find_cuda_init helper script (#25044)
simon-mo Sep 17, 2025
8ad9525
[V0 Deprecation] Remove unused output processor util (#25023)
WoosukKwon Sep 17, 2025
3c73615
Change log level from info to debug for IOProcessor (#24999)
mgoin Sep 17, 2025
d359191
[CI] Revert back prepare_prompts and check_answers (#25087)
WoosukKwon Sep 17, 2025
a3e4f9b
[V0 Deprecation] Remove V0 tests in test_sequence.py (#25088)
WoosukKwon Sep 17, 2025
34256b5
[CI Bugfix] Fix failing test_model_load_with_params tests due to toke…
mgoin Sep 17, 2025
2e20b80
[V1] Logits processor docs (#22919)
afeldman-nm Sep 17, 2025
767aa32
[Misc] Update owners for KV connector and V1 offloading (#25041)
ApostaC Sep 17, 2025
55aacf3
[Bugfix] Update import path for bc_linter_include (#24766)
mmangkad Sep 17, 2025
19df881
[BUG] Exclude .pth files when pulling remote files (#25092)
ahao-anyscale Sep 17, 2025
5b2491c
[Kernel] Faster pre-processing time for W4A8 (#23972)
czhu-cohere Sep 17, 2025
156ba2e
[gpt-oss][2] fix types for streaming (#24556)
qandrew Sep 17, 2025
518e13c
[Bugfix][B200] Fix `cutlass_mla` hang (#24966)
alexm-redhat Sep 17, 2025
85262ca
Aiter mha fp8 fix (#24991)
dllehr-amd Sep 17, 2025
578590d
Disable failing GPT-OSS Eval (Blackwell) for now (#25107)
mgoin Sep 17, 2025
db9c66e
[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic …
elvischenv Sep 17, 2025
8443141
Add a batched auto tune script (#25076)
karan Sep 17, 2025
f8ffbd3
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel …
elvischenv Sep 17, 2025
81eb616
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMeth…
bnellnm Sep 17, 2025
7fc8d9f
[V0 Deprecation] Remove V0 Engine tests (#25114)
WoosukKwon Sep 18, 2025
b2f3121
[V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115)
WoosukKwon Sep 18, 2025
dabeeca
[V0 Deprecation] Remove misc V0 tests (#25118)
WoosukKwon Sep 18, 2025
122b9a6
[V0 Deprecation] Skip PP test (#25128)
WoosukKwon Sep 18, 2025
6beb93b
[Kernels] Enable DeepGEMM by default (#24462)
bnellnm Sep 18, 2025
e1f10bd
[MM Encoder] Apply DP ViT for Qwen3-VL model series (#24955)
ywang96 Sep 18, 2025
e9f59f7
[Docs] Clean up the contributing README (#25099)
hmellor Sep 18, 2025
91780fa
[Core][MM] Cleanup `MultiModalCache` (#25006)
lgeiger Sep 18, 2025
85b06fe
[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and …
toncao Sep 18, 2025
e4dd827
[Kernels] Overlap shared experts with combine instead of dispatch (#2…
bnellnm Sep 18, 2025
342a80b
[Model] enable data parallel for InternVL vision encoder (#23909)
666even666 Sep 18, 2025
5519ecb
Mark prompt logprobs as incompatible with prompt embeds at API level …
qthequartermasterman Sep 18, 2025
5e3ce21
[XPU] Whisper model support on XPU Platform (#25123)
chaojun-zhang Sep 18, 2025
5c9b9db
[EPLB] Add EPLB support for hunyuan_v1 (#23078)
666even666 Sep 18, 2025
acfc54a
[V0 Deprecation] Remove more V0 tests (#25117)
WoosukKwon Sep 18, 2025
da33ab3
[Spec Decode] Efficient padded speculation (#24539)
benchislett Sep 18, 2025
f478dd7
[benchmark] add peak throughput metrics and plot (#23867)
simon-mo Sep 18, 2025
06c1a99
[CLI] Use streaming in CLI chat and completion commands (#23769)
simon-mo Sep 18, 2025
9d57ed1
[Kernel] Better inf handling for grouped topk cu (#24886)
lumina37 Sep 18, 2025
d435bd2
[Docs] Fix API Reference (#25140)
hmellor Sep 18, 2025
c0cfada
Retrieve `sliding_window` from text config in Gemma3 MM (#25085)
hmellor Sep 18, 2025
174a1f9
[Bugfix] when use s3 model cannot use default load_format (#24435)
lengrongfu Sep 18, 2025
b315077
[Qwen] Add fp8 checkpoint support for qwen3-next. (#25079)
sighingnow Sep 18, 2025
a2fe912
Add 'path' option to ImagePrompt data_format (#25081)
gfinol Sep 18, 2025
6a7d69c
[Doc] Fix cross-reference warnings (#25058)
punitvara Sep 18, 2025
a287083
[Chore] Cleanup guided namespace, move to structured outputs config (…
aarnphm Sep 18, 2025
824a8b3
Fix: Add explicit #include <omp.h> for OpenMP compatibility on certai…
ihb2032 Sep 18, 2025
ec54c45
silu-v1: Fix EPS not being used during max-reduction (#25069)
elvircrn Sep 18, 2025
7fa6136
[Frontend] Support setting logprobs to -1 (#25031)
chaunceyjiang Sep 18, 2025
ecc27c3
[Model] Improve Pooling Model (#25149)
jeejeelee Sep 18, 2025
575dd14
Move `StructuredOutputsConfig` from `config/__init__.py` to `config/s…
hmellor Sep 18, 2025
2a8a181
[Docs] Fix pooling-params doc references in openai_compatible_server.…
yankay Sep 18, 2025
eceb0ba
[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24…
gigit0000 Sep 18, 2025
7d3e36f
Fix forward reference warning in documentation (#25150)
hmellor Sep 18, 2025
29dbe7e
Fix `validate-config` pre-commit check (#25157)
hmellor Sep 18, 2025
6d9fdc4
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883)
Josephasafg Sep 18, 2025
6dddc13
[Misc] Clean up flags in `vllm bench serve` (#25138)
ywang96 Sep 18, 2025
4827f22
[Structured Output][Refactor] Move `apply_grammar_bitmask()` method f…
shen-shanshan Sep 18, 2025
6b2bf06
Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#…
mgoin Sep 18, 2025
b50707a
[Misc] Add kv-connector label (#25156)
NickLucche Sep 18, 2025
a0ee87e
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attenti…
jvlunteren Sep 18, 2025
af8824a
[PERF] Add `conv1d` metadata to GDN attn (#25105)
vadiklyutiy Sep 18, 2025
ad12e8b
feat(api): Return 503 on /health when engine is dead (#24897)
dongbo910220 Sep 18, 2025
aebda1b
[New Model] Support BertForTokenClassification / Named Entity Recogni…
noooop Sep 18, 2025
e2fc44f
[Docs] Fix warnings in mkdocs build (continued) (#25163)
Zerohertz Sep 18, 2025
511ba8e
Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
wenscarl Sep 18, 2025
8f1bc71
[Misc] Add codeowner for Transformers backend (#25180)
hmellor Sep 18, 2025
9ef6f59
[spec decode] Fix MTP inference path for MiMo-7B model (#25136)
zixi-qi Sep 18, 2025
3d01678
[ROCm][CI/Build] Use ROCm7.0 as the base (#25178)
gshtras Sep 18, 2025
4862318
fix aiter fp8 linear support
charlifu Sep 18, 2025
40f72cb
[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilat…
Rohan138 Sep 18, 2025
8ff0307
[KV offload][1/N] Introduce an offloading component (#19848)
orozery Sep 18, 2025
1dddf7d
[V0 Deprecation] Remove AsyncLLMEngine (#25025)
WoosukKwon Sep 18, 2025
4daf33f
[fix]: remove data type hardcoding from gptoss model implementation (…
nikhil-arm Sep 18, 2025
6004fe2
[feat]: Create interface for model-specific M-RoPE (#24194)
AzizCode92 Sep 18, 2025
32e9b63
[Bug] Fix `returned_lse` not Defined issue (#25106)
yewentao256 Sep 18, 2025
335003b
[Bug] Fix torch Compilation Cache Hit Error (#25093)
yewentao256 Sep 18, 2025
a36d187
[V0 Deprecation] Remove unused async_timeout.py (#25190)
WoosukKwon Sep 18, 2025
22e5939
[KV offload][1b/N] rename offloading to kv_offload (#25191)
orozery Sep 18, 2025
2f3e391
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206)
LucasWilkinson Sep 18, 2025
eb3e6a0
[CORE] Prompt Embeddings Support for v1 Engine (#24278)
qthequartermasterman Sep 19, 2025
0e4cffa
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
orozery Sep 19, 2025
5d53d0e
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartD…
qandrew Sep 19, 2025
aca56ad
[Perf] Optimize memory peak during EAGLE model loading. (#24585)
candyzone Sep 19, 2025
cbf9296
[Misc] Clean up MM profiling warnings (#25222)
ywang96 Sep 19, 2025
dc7b734
[Docs] Fix griffe warnings in vllm/multimodal (#25216)
windsonsea Sep 19, 2025
6336dcb
[OOT] Support sync_model_loading for OOT (#25126)
xuechendi Sep 19, 2025
8614786
[Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188)
russellb Sep 19, 2025
bcbfd0f
[CPU] Disable oneDNN linear on non-x86 platforms (#25166)
bigPYJ1151 Sep 19, 2025
711fea6
[Bugfix][CPU] Add placeholder to avoid import errors when using fused…
bigPYJ1151 Sep 19, 2025
11bc40c
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#…
Isotr0py Sep 19, 2025
6d9f023
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146)
yma11 Sep 19, 2025
7f3073c
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoP…
Isotr0py Sep 19, 2025
7111624
[Bugfix][Perf] Misc fixes for Qwen3 VL (#25238)
ywang96 Sep 19, 2025
769a3f6
Move `PoolerConfig` from `config/__init__.py` to `config/pooler.py` (…
hmellor Sep 19, 2025
09cbfe3
[P/D][Nixl] Introduce `KVTransferMetrics` and aggregation strategy (#…
NickLucche Sep 19, 2025
a84292b
[V0 Deprecation] Remove V0 logic from `get_input_embeddings` interfac…
DarkLight1337 Sep 19, 2025
fec2cda
[Qwen] Remove cuda hard-code in qwen3 next (#25243)
wxsIcey Sep 19, 2025
f0af653
Update CODEOWNERS (#25269)
hmellor Sep 19, 2025
a68facf
Move `ModelConfig` from `config/__init__.py` to `config/model.py` (#2…
hmellor Sep 19, 2025
e434176
refactor(benchmarks): add type annotations to wait_for_endpoint param…
samzong Sep 19, 2025
56191dc
[KV offload][3/N] Add worker-side CPU support (#21448)
orozery Sep 19, 2025
0ef2db8
[Frontend] Pass API server count to each process (#23717)
DarkLight1337 Sep 19, 2025
8d1136f
[Core] Modify the initialization parameters of the lora manager (#25249)
jeejeelee Sep 19, 2025
f8b4bcb
Remove Redundant Assignment in Qwen3_VisionPatchMerger (#25224)
LJH-LBJ Sep 19, 2025
bd54073
Encoder model support for the Transformers backend (#25174)
hmellor Sep 19, 2025
9dd110d
[CI/Build] fix test function_calling (#25072)
chaunceyjiang Sep 19, 2025
686863d
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainan…
Jialin Sep 19, 2025
1da205f
[Docs] add __init__.py to vllm/model_executor/layers/quantization/com…
samzong Sep 19, 2025
a173473
[bugfix] fix structured outputs key missing issue from #24929 (#25195)
luccafong Sep 19, 2025
eae22a8
[KV offload][4/N] Offloading KV connector (#22595)
orozery Sep 19, 2025
e77a52f
Optimize triton unified attention performance for sliding window atte…
zixi-qi Sep 19, 2025
c6be3f3
[Bugfix] GPT OSS Attritbute error on H100 (#25228)
varun-sundar-rabindranath Sep 19, 2025
972feb7
[Bugfix] Fix chunked a2_scales in modular kernels (#25264)
bnellnm Sep 19, 2025
4a31275
Specify platform in `pip-compile` `pre-commit` hook so it runs on Mac…
hmellor Sep 19, 2025
f7d8e68
[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when avai…
mgoin Sep 19, 2025
9495448
[BugFix] Make FlashInferMetadataBuilder non-blocking (#25040)
nvjullin Sep 19, 2025
8a7991b
Fix: Correct FusedMoE layer reference in auto_round quantization (#24…
David-Wen2025 Sep 19, 2025
f3ef285
[Frontend] Responses API messages out, just harmony for now (#24985)
alecsolder Sep 19, 2025
fe25c24
[Compile] Fix Compile Warning for Ignoring `MIN_BLOCK_PER_SM` (#25193)
yewentao256 Sep 19, 2025
6f02ce1
Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771)
Edwardf0t1 Sep 19, 2025
3abedfb
allow disable flashinfer prefill (#25276)
luccafong Sep 19, 2025
f18f38b
[BugFix] Fix async scheduling CPU tensor race take 2 (#25279)
njhill Sep 19, 2025
5fde13b
[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090)
Lucaskabela Sep 20, 2025
1df2fd0
Don't skip special tokens with hermes-style tool calling (#25281)
maxdebayser Sep 20, 2025
e585586
test: Remove vestigial skip for prompt embeds tests after landing v1 …
qthequartermasterman Sep 20, 2025
767ae97
[docs] Prompt Embedding feature support (#25288)
qthequartermasterman Sep 20, 2025
38db70e
[torch.compile] CUDAGraph Inductor partition integration (#24281)
BoyuanFeng Sep 20, 2025
a0cee55
[BugFix] Ensure appropriate guards in destructors (#25284)
njhill Sep 20, 2025
b8eefb7
[Misc] Support more collective_rpc return types (#25294)
njhill Sep 20, 2025
c0d7622
Improve weight loading for encoder models in Transformers backend (#2…
hmellor Sep 20, 2025
9e54a65
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…
JartX Sep 20, 2025
e11fe87
[BugFix] Exclude self when checking for port collision (#25286)
njhill Sep 20, 2025
6ddda58
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attent…
xuechendi Sep 20, 2025
3e213d4
[Bugfix] fix tool call arguments is empty (#25223)
chaunceyjiang Sep 20, 2025
df0d15d
[Optimization] Avoid repeated model architecture conversion for pooli…
DarkLight1337 Sep 20, 2025
fd82e53
[Hybrid Allocator] Support full attention with different hidden size …
heheda12345 Sep 20, 2025
112acee
[Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300)
ywang96 Sep 20, 2025
1dfc574
[V1] Support `LLM.apply_model` (#18465)
DarkLight1337 Sep 20, 2025
0b92683
[CI Failure] Disable FlashInfer RoPE to unblock CI (#25299)
mgoin Sep 20, 2025
18cacbe
[Docs] Fix warnings in mkdocs build (continued) (#25042)
wwl2755 Sep 20, 2025
ed9c0e9
Generate _ModelInfo properties file when loading to improve loading …
manoelmarques Sep 20, 2025
444e13b
[Model] Cleanup InternViT's data parallel implementation (#25306)
Isotr0py Sep 20, 2025
3de3bdf
[Core] Enable sharded state loader for V1 engine and enhance test cov…
lirong-lirong Sep 20, 2025
1b3942c
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
DarkLight1337 Sep 20, 2025
ef3794d
[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25…
windsonsea Sep 20, 2025
8e6b65b
[V0 Deprecation] Remove LLMEngine (#25033)
WoosukKwon Sep 21, 2025
db3bcb3
[V0 Deprecation] Remove V0 Output Processor (#25320)
WoosukKwon Sep 21, 2025
379208c
[Chore] Remove unused sampler in models (#25324)
WoosukKwon Sep 21, 2025
61fd7db
[CI] Skip tests failing on main (#25326)
WoosukKwon Sep 21, 2025
6bdf03e
[V0 Deprecation] Remove V0 core (#25321)
WoosukKwon Sep 21, 2025
25b19d9
[Doc] improve test-pipeline.yaml documentation (#25305)
hl475 Sep 21, 2025
8e29de2
[V0 Deprecation] Remove V0 model runner base & simplify worker base (…
WoosukKwon Sep 21, 2025
17391b2
[Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25…
wwl2755 Sep 21, 2025
b9ca787
[V0 Deprecation] Remove from_seq_group methods (#25330)
WoosukKwon Sep 21, 2025
719ab17
[V0 Deprecation] Remove V0 MP executor (#25329)
WoosukKwon Sep 21, 2025
3df516f
[V1] Add sliding window support to Flex Attention backend (#24089)
Isotr0py Sep 21, 2025
aacff96
[MM][Perf] Minor Optimization on Qwen3-VL `fast_pos_embed_interpolate…
ywang96 Sep 21, 2025
019c2bd
[Bugfix] Typos in error message for missing model config file (#25339)
simondanielsson Sep 21, 2025
db4de66
[Optimization] Cache chat template result when processor fails to be …
DarkLight1337 Sep 21, 2025
4bd4671
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
WoosukKwon Sep 21, 2025
909ef08
[V0 Deprecation] Remove async_output_proc, preemption mode, delay fac…
WoosukKwon Sep 21, 2025
ac209ef
feat: Enable engine-level arguments with speculators models (#25250)
rahul-tuli Sep 21, 2025
aa7291b
[V0 Deprecation] Remove V0 sampling metadata (#25345)
WoosukKwon Sep 21, 2025
0cb73c3
[Perf] Further optimization for Qwen3-VL `fast_pos_embed_interpolate`…
Isotr0py Sep 21, 2025
04aabb9
Remove V0 attention backends (#25351)
WoosukKwon Sep 21, 2025
0a93017
[Bugfix][V0 Deprecation][CI] use async mock and await for async metho…
KKSK-DON Sep 21, 2025
bcca00b
Multimodal - audio tests (#25285)
debroy-rh Sep 21, 2025
5804278
[Model] Support Dots OCR (#24645)
ywang96 Sep 22, 2025
756a3d8
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
david6666666 Sep 22, 2025
82e8e2a
[Bugfix] Fix hermes tool parser handling of non-string argument types…
david6666666 Sep 22, 2025
acaec57
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
DarkLight1337 Sep 22, 2025
6d2b67d
[V0 Deprecation] Remove `MultiModalPlaceholderMap` (#25366)
DarkLight1337 Sep 22, 2025
d5f9b30
Enable Eagle3 speculative decoding for GPT-OSS model (#25246)
eldarkurtic Sep 22, 2025
e325bcf
[TPU][Bugfix][CI] Fix broken tests/build dependency (#25255)
NickLucche Sep 22, 2025
92e1bc6
[TPU] Deprecate `xm.mark_step` in favor of ``torch_xla.sync` (#25254)
NickLucche Sep 22, 2025
fd265e0
refactor: abstract graph mode support into platform interface (#25161)
yiz-liu Sep 22, 2025
106d6cd
[Misc] Remove unused encoder-decoder error strings (#25374)
DarkLight1337 Sep 22, 2025
7b9a38a
Make pickle import check fast (#25379)
hmellor Sep 22, 2025
713132f
Make `mypy` behave like a proper pre-commit hook (#25313)
hmellor Sep 22, 2025
2768858
[Kernel] MI-300X triton moe configs (#23445)
Sara-KS Sep 22, 2025
9901ac1
[Bugfix] Fix several issues with p2p xPyD in GET type (#23993)
Csrayz Sep 22, 2025
2d5ffdc
[V1][Attention] Split triton_attn in triton-only and rocm specific ba…
bringlein Sep 22, 2025
c5f3b03
[EPLB] Reduce EPLB Inference Overhead (#24573)
abmfy Sep 22, 2025
acf8511
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in en…
Daisy-Ma-coder Sep 22, 2025
3a814c8
[Compiler] Disable Inductor standalone compile by default (#25391)
ElizaWszola Sep 22, 2025
4902a1e
[CI Failure] Fix fp8 kv cache on <SM90 (#25396)
mgoin Sep 22, 2025
963e215
[DP] support torchrun external launcher with Data Parallelism (#24899)
luccafong Sep 22, 2025
413d33e
[misc] Remove RFC review hours reference (#25416)
simon-mo Sep 22, 2025
1fe97c4
[torch.compile] Cleanup compilation tests and custom passes, add debu…
ProExpertProg Sep 22, 2025
83626f1
[KV offload][5/N] Add `CPUOffloadingSpec` (#24251)
orozery Sep 22, 2025
fc4ab54
[CI/Build] Skip Qwen3-VL initialization tests until models are actual…
DarkLight1337 Sep 22, 2025
f25fa36
[TPU] update torch_xla dependency for PyPI compatibility (#25278)
jcyang43 Sep 22, 2025
1361bf8
[Frontend] Responses API MCP tools for built in tools and to pass thr…
alecsolder Sep 22, 2025
483cab8
[Bugfix] fix custom op test (#25429)
ProExpertProg Sep 23, 2025
6139f55
[Core] Drop overly aggressive whisper assertion (#25408)
russellb Sep 23, 2025
5b1f0cb
[Bugfix] Fix missing `clear_connector_metadata` (#25397)
NickLucche Sep 23, 2025
fa8b17e
[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407)
MatthewBonanni Sep 23, 2025
42337d2
[Performance] Remove input pads in cutlass_mla and optimize v_proj ou…
alexm-redhat Sep 23, 2025
fd9423a
[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)
yewentao256 Sep 23, 2025
c973ab7
[V0 deprecation] Remove platform v1 controling interface (#25410)
Isotr0py Sep 23, 2025
2d9b83d
[V0 deprecation] Remove `_set_default_args_v0` function (#25409)
Isotr0py Sep 23, 2025
c1c497a
[Bug] Fix Long Context OOM Issue (#25290)
yewentao256 Sep 23, 2025
a8a200a
[feat] Support MRoPE + YaRN (#25384)
JJJYmmm Sep 23, 2025
894e8e7
[XPU] Fix `compile_size` is `None` case. (#25433)
jikunshang Sep 23, 2025
27ac795
[benchmarks]allow skip ready check for bench serve (#25420)
luccafong Sep 23, 2025
4e73bb8
[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
mgoin Sep 23, 2025
39f7d28
[Docs] Fix griffe warnings in vllm/lora/ops (#25369)
windsonsea Sep 23, 2025
31259d2
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
varun-sundar-rabindranath Sep 23, 2025
d1f8b3b
[NIXL][OOT platform] support nixl_connector with oot platform and oth…
xuechendi Sep 23, 2025
befa4d0
[Model] Enable DP for ViT in Qwen2-VL (#25445)
DarkLight1337 Sep 23, 2025
570cbdd
Handle triton kernel import exception (#25319)
minosfuture Sep 23, 2025
0e12972
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Zhikaiiii Sep 23, 2025
cba3e03
[Misc] Move DP for ViT code inside model executor dir (#25459)
DarkLight1337 Sep 23, 2025
8960ec2
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
ahartel Sep 23, 2025
e078d4f
[Bugfix] Fix idefics3 `tie_word_embeddings` (#25454)
Isotr0py Sep 23, 2025
ae6abfe
[Core] Optimize LoRA weight loading (#25403)
jeejeelee Sep 23, 2025
ff0c1ea
[docs] Benchmark Serving Incorrect Arg (#25474)
vllmellm Sep 23, 2025
81bcdbb
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Isotr0py Sep 23, 2025
09e2cc7
[BugFix] Register expert_map as named buffer for wake_up and sleep (#…
wuxibin89 Sep 23, 2025
6e19eb2
[P/D] Support NIXL connector to disconnect during a clean shutdown (#…
chaunceyjiang Sep 23, 2025
54b16af
[Docs] NixlConnector quickstart guide (#24249)
panpan0000 Sep 23, 2025
d1509a2
[XPU] Fix MOE DP accuracy issue on XPU (#25465)
faaany Sep 23, 2025
ab59321
[UX] Change kv-cache-memory log level to debug (#25479)
mgoin Sep 23, 2025
42a395e
[V1] Remove V0 code paths for Hybrid models (#25400)
tdoublep Sep 23, 2025
f04b970
Use fusion pass to select AITER group quant RMSNorm and w8a8 gemm (#707)
micah-wil Sep 24, 2025
756ff93
add is rocm aiter linear enabled
charlifu Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 0 additions & 10 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,6 @@ if [[ $commands == *"pytest -v -s models/test_registry.py"* ]]; then
commands=${commands//"pytest -v -s models/test_registry.py"/"pytest -v -s models/test_registry.py -k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'"}
fi

if [[ $commands == *"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2'"* ]]; then
commands=${commands//"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2'"/"VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2 and not BambaForCausalLM and not Gemma2ForCausalLM and not Grok1ModelForCausalLM and not Zamba2ForCausalLM and not Gemma2Model and not GritLM'"}
fi

if [[ $commands == *"pytest -v -s compile/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s compile/test_basic_correctness.py"}
fi
Expand Down Expand Up @@ -167,12 +163,6 @@ if [[ $commands == *" entrypoints/llm "* ]]; then
--ignore=entrypoints/llm/test_prompt_validation.py "}
fi

#Obsolete currently
##ignore certain Entrypoints/llm tests
#if [[ $commands == *" && pytest -v -s entrypoints/llm/test_guided_generate.py"* ]]; then
# commands=${commands//" && pytest -v -s entrypoints/llm/test_guided_generate.py"/" "}
#fi

# --ignore=entrypoints/openai/test_encoder_decoder.py \
# --ignore=entrypoints/openai/test_embedding.py \
# --ignore=entrypoints/openai/test_oot_registration.py
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-tpu-v1-test-part2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d" \
&& python3 -m pip install --progress-bar off hf-transfer
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---"
export VLLM_USE_V1=1
export VLLM_XLA_CHECK_RECOMPILATION=1
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-tpu-v1-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d" \
&& python3 -m pip install --progress-bar off hf-transfer
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---"
export VLLM_USE_V1=1
export VLLM_XLA_CHECK_RECOMPILATION=1
Expand Down
98 changes: 49 additions & 49 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,28 @@
# to generate the final pipeline yaml file.

# Documentation
# label(str): the name of the test. emoji allowed.
# fast_check(bool): whether to run this on each commit on fastcheck pipeline.
# torch_nightly(bool): whether to run this on vllm against torch nightly pipeline.
# fast_check_only(bool): run this test on fastcheck pipeline only
# optional(bool): never run this test by default (i.e. need to unblock manually) unless it's scheduled nightly run.
# label(str): the name of the test. emojis allowed.
# fast_check(bool): whether to run this on each commit on the fastcheck pipeline.
# torch_nightly(bool): whether to run this on vllm against the torch nightly pipeline.
# fast_check_only(bool): run this test on the fastcheck pipeline only
# optional(bool): never run this test by default (i.e. need to unblock manually) unless it's a scheduled nightly run.
# soft_fail(bool): allow this step to fail without failing the entire pipeline (useful for flaky or experimental tests).
# command(str): the single command to run for tests. incompatible with commands.
# commands(list): the list of commands to run for test. incompatbile with command.
# mirror_hardwares(list): the list of hardwares to run the test on as well. currently only supports [amd]
# gpu(str): override the GPU selection for the test. default is on L4 GPUs. currently only supports a100
# num_gpus(int): override the number of GPUs for the test. default to 1 GPU. currently support 2,4.
# num_nodes(int): whether to simulate multi-node setup by launch multiple containers on one host,
# in this case, commands must be specified. the first command runs on first host, the second
# commands(list): the list of commands to run for the test. incompatible with command.
# mirror_hardwares(list): the list of hardware to run the test on as well. currently only supports [amdexperimental]
# gpu(str): override the GPU selection for the test. default is L4 GPUs. supports a100, b200, h200
# num_gpus(int): override the number of GPUs for the test. defaults to 1 GPU. currently supports 2,4.
# num_nodes(int): whether to simulate multi-node setup by launching multiple containers on one host,
# in this case, commands must be specified. the first command runs on the first host, the second
# command runs on the second host.
# working_dir(str): specify the place where command should execute, default to /vllm-workspace/tests
# source_file_dependencies(list): the list of prefix to opt-in the test for, if empty, the test will always run.
# timeout_in_minutes(int): sets a timeout for the step in minutes. if not specified, uses the default timeout.
# parallelism(int): number of parallel jobs to run for this step. enables test sharding using $$BUILDKITE_PARALLEL_JOB
# and $$BUILDKITE_PARALLEL_JOB_COUNT environment variables.
# working_dir(str): specify the place where the command should execute, default to /vllm-workspace/tests
# source_file_dependencies(list): the list of prefixes to opt-in the test for, if empty, the test will always run.

# When adding a test
# - If the test belong to an existing group, add it there
# - If the test belongs to an existing group, add it there
# - If the test is short, add to any existing step
# - If the test takes more than 10min, then it is okay to create a new step.
# Note that all steps execute in parallel.
Expand All @@ -46,24 +50,18 @@ steps:
mirror_hardwares: [amdexperimental]
source_file_dependencies:
- vllm/
- tests/mq_llm_engine
- tests/async_engine
- tests/test_inputs.py
- tests/test_outputs.py
- tests/multimodal
- tests/utils_
- tests/worker
- tests/standalone_tests/lazy_imports.py
- tests/transformers_utils
commands:
- python3 standalone_tests/lazy_imports.py
- pytest -v -s mq_llm_engine # MQLLMEngine
- pytest -v -s async_engine # AsyncLLMEngine
- pytest -v -s test_inputs.py
- pytest -v -s test_outputs.py
- pytest -v -s multimodal
- pytest -v -s utils_ # Utils
- pytest -v -s worker # Worker
- pytest -v -s transformers_utils # transformers_utils

- label: Python-only Installation Test # 10min
Expand All @@ -84,25 +82,12 @@ steps:
- vllm/
- tests/basic_correctness/test_basic_correctness
- tests/basic_correctness/test_cpu_offload
- tests/basic_correctness/test_preemption
- tests/basic_correctness/test_cumem.py
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s basic_correctness/test_cumem.py
- pytest -v -s basic_correctness/test_basic_correctness.py
- pytest -v -s basic_correctness/test_cpu_offload.py
- VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py

- label: Core Test # 22min
timeout_in_minutes: 35
mirror_hardwares: [amdexperimental]
fast_check: true
source_file_dependencies:
- vllm/core
- vllm/distributed
- tests/core
commands:
- pytest -v -s core

- label: Entrypoints Unit Tests # 5min
timeout_in_minutes: 10
Expand All @@ -127,10 +112,9 @@ steps:
- tests/entrypoints/offline_mode
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_collective_rpc.py
- pytest -v -s entrypoints/llm/test_lazy_outlines.py # it needs a clean process
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_collective_rpc.py
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- VLLM_USE_V1=0 pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests

- label: Entrypoints Integration Test (API Server) # 100min
timeout_in_minutes: 130
Expand Down Expand Up @@ -168,7 +152,6 @@ steps:
num_gpus: 4
source_file_dependencies:
- vllm/distributed/
- vllm/core/
- tests/distributed/test_utils
- tests/distributed/test_pynccl
- tests/distributed/test_events
Expand All @@ -182,11 +165,18 @@ steps:
- tests/v1/test_hybrid_lb_dp.py
- tests/v1/engine/test_engine_core_client.py
commands:
# test with tp=2 and external_dp=2
- VLLM_USE_V1=0 torchrun --nproc-per-node=4 distributed/test_torchrun_example.py
# test with torchrun tp=2 and external_dp=2
- torchrun --nproc-per-node=4 distributed/test_torchrun_example.py
# test with tp=2 and pp=2
# test with torchrun tp=2 and pp=2
- PP_SIZE=2 torchrun --nproc-per-node=4 distributed/test_torchrun_example.py
# test with torchrun tp=4 and dp=1
- TP_SIZE=4 torchrun --nproc-per-node=4 distributed/test_torchrun_example_moe.py
# test with torchrun tp=2, pp=2 and dp=1
- PP_SIZE=2 TP_SIZE=2 torchrun --nproc-per-node=4 distributed/test_torchrun_example_moe.py
# test with torchrun tp=1 and dp=4 with ep
- DP_SIZE=4 ENABLE_EP=1 torchrun --nproc-per-node=4 distributed/test_torchrun_example_moe.py
# test with torchrun tp=2 and dp=2 with ep
- TP_SIZE=2 DP_SIZE=2 ENABLE_EP=1 torchrun --nproc-per-node=4 distributed/test_torchrun_example_moe.py
# test with internal dp
- python3 ../examples/offline_inference/data_parallel.py --enforce-eager
- TP_SIZE=2 DP_SIZE=2 pytest -v -s v1/test_async_llm_dp.py
Expand Down Expand Up @@ -230,16 +220,14 @@ steps:
num_gpus: 2
source_file_dependencies:
- vllm/
- tests/metrics
- tests/v1/tracing
commands:
- pytest -v -s metrics
- "pip install \
'opentelemetry-sdk>=1.26.0' \
'opentelemetry-api>=1.26.0' \
'opentelemetry-exporter-otlp>=1.26.0' \
'opentelemetry-semantic-conventions-ai>=0.4.1'"
- pytest -v -s tracing
- pytest -v -s v1/tracing

##### fast check tests #####
##### 1 GPU test #####
Expand Down Expand Up @@ -302,6 +290,7 @@ steps:
# split the test to avoid interference
- pytest -v -s v1/core
- pytest -v -s v1/executor
- pytest -v -s v1/kv_offload
- pytest -v -s v1/sample
- pytest -v -s v1/logits_processors
- pytest -v -s v1/worker
Expand Down Expand Up @@ -335,12 +324,11 @@ steps:
- python3 offline_inference/vision_language.py --seed 0
- python3 offline_inference/vision_language_pooling.py --seed 0
- python3 offline_inference/vision_language_multi_image.py --seed 0
- VLLM_USE_V1=0 python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
- python3 offline_inference/basic/classify.py
- python3 offline_inference/basic/embed.py
- python3 offline_inference/basic/score.py
- VLLM_USE_V1=0 python3 offline_inference/profiling.py --model facebook/opt-125m run_num_steps --num-steps 2

- label: Platform Tests (CUDA) # 4min
timeout_in_minutes: 15
Expand Down Expand Up @@ -809,7 +797,7 @@ steps:
# Quantization
- pytest -v -s tests/kernels/quantization/test_cutlass_scaled_mm.py -k 'fp8'
- pytest -v -s tests/kernels/quantization/test_nvfp4_quant.py
- pytest -v -s tests/kernels/quantization/test_silu_nvfp4_quant_fusion.py
- pytest -v -s tests/kernels/quantization/test_silu_mul_nvfp4_quant.py
- pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py
- pytest -v -s tests/kernels/quantization/test_flashinfer_scaled_mm.py
- pytest -v -s tests/kernels/quantization/test_flashinfer_nvfp4_scaled_mm.py
Expand All @@ -821,6 +809,20 @@ steps:
- pytest -v -s tests/kernels/moe/test_flashinfer.py
- pytest -v -s tests/compile/test_silu_mul_quant_fusion.py

- label: GPT-OSS Eval (Blackwell)
timeout_in_minutes: 60
working_dir: "/vllm-workspace/"
gpu: b200
optional: true # disable while debugging
source_file_dependencies:
- tests/evals/gpt_oss
- vllm/model_executor/models/gpt_oss.py
- vllm/model_executor/layers/quantization/mxfp4.py
- vllm/v1/attention/backends/flashinfer.py
commands:
- uv pip install --system 'gpt-oss[eval]==0.0.5'
- pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py --model openai/gpt-oss-20b --metric 0.58 --server-args '--tensor-parallel-size 2'

##### 1 GPU test #####
##### multi gpus test #####

Expand Down Expand Up @@ -876,8 +878,6 @@ steps:
- tests/distributed/
- vllm/compilation
- vllm/worker/worker_base.py
- vllm/worker/worker.py
- vllm/worker/model_runner.py
- entrypoints/llm/test_collective_rpc.py
- tests/v1/test_async_llm_dp.py
- tests/v1/test_external_lb_dp.py
Expand All @@ -901,7 +901,7 @@ steps:
- pytest -v -s distributed/test_sequence_parallel.py
# this test fails consistently.
# TODO: investigate and fix
- VLLM_USE_V1=0 CUDA_VISIBLE_DEVICES=0,1 pytest -v -s test_sharded_state_loader.py
- CUDA_VISIBLE_DEVICES=0,1 pytest -v -s test_sharded_state_loader.py
- CUDA_VISIBLE_DEVICES=0,1 pytest -v -s v1/shutdown
- pytest -v -s models/multimodal/generation/test_maverick.py

Expand Down
34 changes: 22 additions & 12 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@
# This lists cover the "core" components of vLLM that require careful review
/vllm/attention @LucasWilkinson
/vllm/attention/backends/abstract.py @WoosukKwon @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill
/vllm/core @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill
/vllm/engine/llm_engine.py @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill
/vllm/executor/executor_base.py @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill @22quinn
/vllm/worker/worker_base.py @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill @22quinn
/vllm/worker/worker.py @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill
/vllm/model_executor/layers/fused_moe @mgoin
/vllm/model_executor/layers/sampler.py @zhuohan123 @youkaichao @alexm-redhat @comaniac @njhill @NickLucche
/vllm/model_executor/layers/quantization @mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256
Expand All @@ -22,7 +19,7 @@
/vllm/reasoning @aarnphm @chaunceyjiang
/vllm/entrypoints @aarnphm @chaunceyjiang
/vllm/compilation @zou3519 @youkaichao @ProExpertProg
/vllm/distributed/kv_transfer @NickLucche
/vllm/distributed/kv_transfer @NickLucche @ApostaC
CMakeLists.txt @tlrmchlsmth @LucasWilkinson

# Any change to the VllmConfig changes can have a large user-facing impact,
Expand All @@ -35,12 +32,12 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson
/vllm/v1/spec_decode @benchislett @luccafong
/vllm/v1/attention/backends/flashinfer.py @mgoin
/vllm/v1/attention/backends/triton_attn.py @tdoublep
/vllm/v1/core @heheda12345
/vllm/v1/core @WoosukKwon @robertgshaw2-redhat @njhill @ywang96 @comaniac @alexm-redhat @heheda12345 @ApostaC
/vllm/v1/kv_cache_interface.py @heheda12345
/vllm/v1/offloading @ApostaC

# Test ownership
/.buildkite/lm-eval-harness @mgoin @simon-mo
/tests/async_engine @njhill @robertgshaw2-redhat @simon-mo
/tests/distributed/test_multi_node_assignment.py @youkaichao
/tests/distributed/test_pipeline_parallel.py @youkaichao
/tests/distributed/test_same_node.py @youkaichao
Expand All @@ -49,30 +46,43 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson
/tests/kernels @mgoin @tlrmchlsmth @WoosukKwon @yewentao256
/tests/models @DarkLight1337 @ywang96
/tests/multimodal @DarkLight1337 @ywang96 @NickLucche
/tests/prefix_caching @comaniac @KuntaiDu
/tests/quantization @mgoin @robertgshaw2-redhat @yewentao256
/tests/test_inputs.py @DarkLight1337 @ywang96
/tests/v1/entrypoints/llm/test_struct_output_generate.py @mgoin @russellb @aarnphm
/tests/v1/structured_output @mgoin @russellb @aarnphm
/tests/v1/core @heheda12345
/tests/v1/core @WoosukKwon @robertgshaw2-redhat @njhill @ywang96 @comaniac @alexm-redhat @heheda12345 @ApostaC
/tests/weight_loading @mgoin @youkaichao @yewentao256
/tests/lora @jeejeelee
/tests/models/language/generation/test_hybrid.py @tdoublep
/tests/v1/kv_connector/nixl_integration @NickLucche
/tests/v1/kv_connector/nixl_integration @NickLucche
/tests/v1/kv_connector @ApostaC
/tests/v1/offloading @ApostaC

# Transformers backend
/vllm/model_executor/models/transformers.py @hmellor
/tests/models/test_transformers.py @hmellor

# Docs
/docs @hmellor
/docs/mkdocs @hmellor
/docs/**/*.yml @hmellor
/requirements/docs.txt @hmellor
.readthedocs.yaml @hmellor
mkdocs.yaml @hmellor

# Linting
.markdownlint.yaml @hmellor
.pre-commit-config.yaml @hmellor
/tools/pre_commit @hmellor

# CPU
/vllm/v1/worker/^cpu @bigPYJ1151
/vllm/v1/worker/cpu* @bigPYJ1151
/csrc/cpu @bigPYJ1151
/vllm/platforms/cpu.py @bigPYJ1151
/cmake/cpu_extension.cmake @bigPYJ1151
/docker/Dockerfile.cpu @bigPYJ1151

# Intel GPU
/vllm/v1/worker/^xpu @jikunshang
/vllm/v1/worker/xpu* @jikunshang
/vllm/platforms/xpu.py @jikunshang
/docker/Dockerfile.xpu @jikunshang

Expand Down
4 changes: 0 additions & 4 deletions .github/ISSUE_TEMPLATE/750-RFC.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,6 @@ body:
Any other things you would like to mention.
validations:
required: false
- type: markdown
attributes:
value: >
Thanks for contributing 🎉! The vLLM core team hosts a biweekly RFC review session at 9:30AM Pacific Time, while most RFCs can be discussed online, you can optionally sign up for a slot to discuss your RFC online [here](https://docs.google.com/document/d/1CiLVBZeIVfR7_PNAKVSusxpceywkoOOB78qoWqHvSZc/edit).
- type: checkboxes
id: askllm
attributes:
Expand Down
19 changes: 18 additions & 1 deletion .github/mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ pull_request_rules:
- files=examples/online_serving/openai_chat_completion_structured_outputs.py
- files=examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py
- files~=^tests/v1/structured_output/
- files=tests/v1/entrypoints/llm/test_guided_generate.py
- files=tests/v1/entrypoints/llm/test_struct_output_generate.py
- files~=^vllm/v1/structured_output/
actions:
label:
Expand Down Expand Up @@ -302,3 +302,20 @@ pull_request_rules:
label:
remove:
- needs-rebase

- name: label-kv-connector
description: Automatically apply kv-connector label
conditions:
- or:
- files~=^examples/online_serving/disaggregated[^/]*/.*
- files~=^examples/offline_inference/disaggregated[^/]*/.*
- files~=^examples/others/lmcache/
- files~=^tests/v1/kv_connector/
- files~=^vllm/distributed/kv_transfer/
- title~=(?i)\bP/?D\b
- title~=(?i)NIXL
- title~=(?i)LMCache
actions:
label:
add:
- kv-connector
Loading