Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Use xla flag to improve the quantized model performance ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
#19303 opened Jun 6, 2025 by vanbasten23 Loading…
3 tasks done
[Misc] Change tests/compile to use VLLM_V1 by default ready ONLY add when PR is ready to merge/full CI is needed
#19302 opened Jun 6, 2025 by zou3519 Draft
[Bugfix] Re-enable use_cudagraph in vLLM v1 ready ONLY add when PR is ready to merge/full CI is needed
#19299 opened Jun 6, 2025 by zou3519 Loading…
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ready ONLY add when PR is ready to merge/full CI is needed v1
#19298 opened Jun 6, 2025 by varun-sundar-rabindranath Loading…
[CI] Update FlashInfer to 0.2.6 ci/build
#19297 opened Jun 6, 2025 by mgoin Loading…
[Quantization] Bump compressed-tensors version ci/build ready ONLY add when PR is ready to merge/full CI is needed
#19295 opened Jun 6, 2025 by kylesayrs Loading…
[V1] Add API docs for EncoderCacheManager ready ONLY add when PR is ready to merge/full CI is needed v1
#19294 opened Jun 6, 2025 by russellb Loading…
[TPU] support fp8 kv cache quantization tpu Related to Google TPUs v1
#19292 opened Jun 6, 2025 by yaochengji Loading…
[Metrics] Compute and log the serving FLOPs documentation Improvements or additions to documentation
#19290 opened Jun 6, 2025 by sysradium Loading…
[Misc] Add documentation update reminder to PR template ci/build
#19289 opened Jun 6, 2025 by Isotr0py Loading…
1 of 3 tasks
[CI/Build] Improve Llama GGUF test robustness ready ONLY add when PR is ready to merge/full CI is needed
#19287 opened Jun 6, 2025 by Isotr0py Loading…
1 of 3 tasks
[Core] Update error message for Whisper + num-scheduler-steps > 1 ready ONLY add when PR is ready to merge/full CI is needed
#19286 opened Jun 6, 2025 by russellb Loading…
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer ready ONLY add when PR is ready to merge/full CI is needed v1
#19283 opened Jun 6, 2025 by chaunceyjiang Loading…
[V1][Kernel] Flashinfer HND KV cache layout v1
#19280 opened Jun 6, 2025 by NickLucche Loading…
Update compatible packaging version ci/build ready ONLY add when PR is ready to merge/full CI is needed
#19279 opened Jun 6, 2025 by pramenku Loading…
[Frontend] optimize beam_search code
#19267 opened Jun 6, 2025 by zhanggzh Loading…
Fix TorchAOConfig skip layers
#19265 opened Jun 6, 2025 by mobicham Loading…
[CPU] Fix torch version in x86 CPU backend and refine default configurations ci/build multi-modality Related to multi-modality (#4194) v1
#19258 opened Jun 6, 2025 by bigPYJ1151 Loading…
3 tasks done
[Bugfix] ROCm FP8 Quantization Padding Issue
#19251 opened Jun 6, 2025 by vllmellm Loading…
[Misc] refactor context extension documentation Improvements or additions to documentation
#19246 opened Jun 6, 2025 by reidliu41 Loading…
3 tasks
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.