vllm-project / vllm Public

Notifications
Fork 7.8k
Star 49.1k

Code
Issues 1.9k
Pull requests 696
Discussions
Actions
Projects 11
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 50 Milestones 1

New pull request New

696 Open 9,025 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Use xla flag to improve the quantized model performance ready

ONLY add when PR is ready to merge/full CI is needed

tpu

Related to Google TPUs

#19303 opened Jun 6, 2025 by vanbasten23

Loading…

3 tasks done

[Misc] Change tests/compile to use VLLM_V1 by default ready

ONLY add when PR is ready to merge/full CI is needed

#19302 opened Jun 6, 2025 by zou3519 • Draft

Add optional token-level progress bar to LLM.beam_search using tqdm frontend

#19301 opened Jun 6, 2025 by NekoMimiUnagi

Loading…

3 tasks done

[Bugfix] Re-enable use_cudagraph in vLLM v1 ready

ONLY add when PR is ready to merge/full CI is needed

#19299 opened Jun 6, 2025 by zou3519

Loading…

[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ready

ONLY add when PR is ready to merge/full CI is needed

#19298 opened Jun 6, 2025 by varun-sundar-rabindranath

Loading…

[CI] Update FlashInfer to 0.2.6 ci/build

#19297 opened Jun 6, 2025 by mgoin

Loading…

[Quantization] Bump compressed-tensors version ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#19295 opened Jun 6, 2025 by kylesayrs

Loading…

[V1] Add API docs for EncoderCacheManager ready

ONLY add when PR is ready to merge/full CI is needed

#19294 opened Jun 6, 2025 by russellb

Loading…

[TPU] support fp8 kv cache quantization tpu

Related to Google TPUs

#19292 opened Jun 6, 2025 by yaochengji

Loading…

[Metrics] Compute and log the serving FLOPs documentation

Improvements or additions to documentation

#19290 opened Jun 6, 2025 by sysradium

Loading…

[Misc] Add documentation update reminder to PR template ci/build

#19289 opened Jun 6, 2025 by Isotr0py

Loading…

1 of 3 tasks

[Frontend] Remove unreachable code from llm.py frontend

#19288 opened Jun 6, 2025 by KsuParkhamchuk

Loading…

[CI/Build] Improve Llama GGUF test robustness ready

ONLY add when PR is ready to merge/full CI is needed

#19287 opened Jun 6, 2025 by Isotr0py

Loading…

1 of 3 tasks

[Core] Update error message for Whisper + num-scheduler-steps > 1 ready

ONLY add when PR is ready to merge/full CI is needed

#19286 opened Jun 6, 2025 by russellb

Loading…

[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer ready

ONLY add when PR is ready to merge/full CI is needed

#19283 opened Jun 6, 2025 by chaunceyjiang

Loading…

[V1][Kernel] Flashinfer HND KV cache layout v1

#19280 opened Jun 6, 2025 by NickLucche

Loading…

Update compatible packaging version ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#19279 opened Jun 6, 2025 by pramenku

Loading…

[Frontend] optimize beam_search code

#19267 opened Jun 6, 2025 by zhanggzh

Loading…

Fix TorchAOConfig skip layers

#19265 opened Jun 6, 2025 by mobicham

Loading…

Convert kv_transfer_config from dict to KVTransferConfig to fix #19259 frontend

#19262 opened Jun 6, 2025 by maobaolong

Loading…

[New Model]: Support Qwen3 Embedding & Reranker frontend

#19260 opened Jun 6, 2025 by noooop

Loading…

[CPU] Fix torch version in x86 CPU backend and refine default configurations ci/build multi-modality

Related to multi-modality (#4194)

#19258 opened Jun 6, 2025 by bigPYJ1151

Loading…

3 tasks done

[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#19253 opened Jun 6, 2025 by AaruniAggarwal

Loading…

[Bugfix] ROCm FP8 Quantization Padding Issue

#19251 opened Jun 6, 2025 by vllmellm

Loading…

[Misc] refactor context extension documentation

Improvements or additions to documentation

#19246 opened Jun 6, 2025 by reidliu41

Loading…

3 tasks

Previous 1 2 3 4 5 … 27 28 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!