Skip to content

issues Search Results · repo:vllm-project/vllm language:Python

Filter by

9k results
 (88 ms)

9k results

invllm-project/vllm (press backspace or delete to remove)

Your current environment details summary The output of code python collect_env.py /code /summary ============================== System Info ============================== OS ...
bug
  • twright8
  • 1
  • Opened 
    1 hour ago
  • #20193

Name of failing test tests/quantization/test_fp8.py::test_scaled_fp8_quant Basic information - [ ] Flaky test - [x] Can reproduce locally - [ ] Caused by external libraries (e.g. bug in transformers) ...
ci-failure
  • mgoin
  • Opened 
    2 hours ago
  • #20192

Something has changed since the working commit (0.9.2.dev223+gee5ad8d2c plus my PR). I can reproduce the same gibberish on 0.9.2.dev283+ge9fd658af even without the full cudagraph compile option. After ...
  • cjackal
  • 1
  • Opened 
    4 hours ago
  • #20186

WARNING 06-27 13:31:35 [sampling_params.py:344] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01. (VllmWorker rank=1 pid=1162710) ...
bug
  • sleepwalker2017
  • Opened 
    4 hours ago
  • #20184

I ve spent hours trying to get the Hunyuan model working with vLLM. Downgrading, upgrading, testing different versions..nothing worked so far. The team shared a Docker image using vLLM 0.8.5, so I’m assuming ...
usage
  • summersonnn
  • Opened 
    5 hours ago
  • #20183

🚀 The feature, motivation and pitch Tencent released this new model: https://huggingface.co/tencent/Hunyuan-A13B-Instruct It matches bigger models on benchmarks. It has a decent size to run locally and ...
feature request
  • RodriMora
  • 2
  • Opened 
    7 hours ago
  • #20182

🚀 The feature, motivation and pitch It would be great if we could have support for batch inference for online serving. It seems only supported for offline inference. Also, it seems that the OpenAI interface ...
feature request
  • eslambakr
  • Opened 
    7 hours ago
  • #20181

Your current environment details summary The output of code python collect_env.py /code /summary Collecting environment information... ============================== System Info ============================== ...
bug
  • tfia
  • Opened 
    9 hours ago
  • #20178

Your current environment details summary The output of code python collect_env.py /code /summary Collecting environment information... ============================== System Info ============================== ...
bug
  • luoling1993
  • 2
  • Opened 
    9 hours ago
  • #20177

Proposal to improve performance By reading relative parts of source code and running some test, we find that when launching a MoE model like Qwen3, vLLM seems to use Triton-based fused moe kernel. While ...
performance
  • oldcpple
  • Opened 
    9 hours ago
  • #20176
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub