issues Search Results · repo:vllm-project/vllm language:Python
Filter by
9k results
(88 ms)9k results
invllm-project/vllm (press backspace or delete to remove)Your current environment
details summary The output of code python collect_env.py /code /summary
==============================
System Info
==============================
OS ...
bug
twright8
- 1
- Opened 1 hour ago
- #20193
Name of failing test
tests/quantization/test_fp8.py::test_scaled_fp8_quant
Basic information
- [ ] Flaky test
- [x] Can reproduce locally
- [ ] Caused by external libraries (e.g. bug in transformers) ...
ci-failure
mgoin
- Opened 2 hours ago
- #20192
Something has changed since the working commit (0.9.2.dev223+gee5ad8d2c plus my PR). I can reproduce the same gibberish
on 0.9.2.dev283+ge9fd658af even without the full cudagraph compile option.
After ...
cjackal
- 1
- Opened 4 hours ago
- #20186
WARNING 06-27 13:31:35 [sampling_params.py:344] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
(VllmWorker rank=1 pid=1162710) ...
bug
sleepwalker2017
- Opened 4 hours ago
- #20184
I ve spent hours trying to get the Hunyuan model working with vLLM. Downgrading, upgrading, testing different
versions..nothing worked so far.
The team shared a Docker image using vLLM 0.8.5, so I’m assuming ...
usage
summersonnn
- Opened 5 hours ago
- #20183
🚀 The feature, motivation and pitch
Tencent released this new model: https://huggingface.co/tencent/Hunyuan-A13B-Instruct
It matches bigger models on benchmarks. It has a decent size to run locally and ...
feature request
RodriMora
- 2
- Opened 7 hours ago
- #20182
🚀 The feature, motivation and pitch
It would be great if we could have support for batch inference for online serving. It seems only supported for offline
inference. Also, it seems that the OpenAI interface ...
feature request
eslambakr
- Opened 7 hours ago
- #20181
Your current environment
details summary The output of code python collect_env.py /code /summary
Collecting environment information...
==============================
System Info
============================== ...
bug
tfia
- Opened 9 hours ago
- #20178
Your current environment
details summary The output of code python collect_env.py /code /summary
Collecting environment information...
==============================
System Info
============================== ...
bug
luoling1993
- 2
- Opened 9 hours ago
- #20177
Proposal to improve performance
By reading relative parts of source code and running some test, we find that when launching a MoE model like Qwen3, vLLM
seems to use Triton-based fused moe kernel. While ...
performance
oldcpple
- Opened 9 hours ago
- #20176

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.