Skip to content

How to install and use vLLM to serve multiple large language models #15602

@moshilangzi

Description

@moshilangzi

Proposal to improve performance

How to install and use vLLM to serve multiple large language models:

google/gemma-3-27b-it
mistralai/Mistral-Small-3.1-24B-Instruct-2503
Skywork/Skywork-R1V-38B
Qwen/Qwen2.5-VL

The core issue is that gemma-3-27b requires a very specific (often newer) version of the transformers library. Installing this specific version might break compatibility with vLLM itself or with other models (like certain Mistral or Qwen versions) which might rely on a different, possibly more stable or slightly older, transformers version that vLLM commonly supports.

Report of performance regression

How to install and use vLLM to serve multiple large language models:

google/gemma-3-27b-it
mistralai/Mistral-Small-3.1-24B-Instruct-2503
Skywork/Skywork-R1V-38B
Qwen/Qwen2.5-VL

The core issue is that gemma-3-27b requires a very specific (often newer) version of the transformers library. Installing this specific version might break compatibility with vLLM itself or with other models (like certain Mistral or Qwen versions) which might rely on a different, possibly more stable or slightly older, transformers version that vLLM commonly supports.

Misc discussion on performance

How to install and use vLLM to serve multiple large language models:

google/gemma-3-27b-it
mistralai/Mistral-Small-3.1-24B-Instruct-2503
Skywork/Skywork-R1V-38B
Qwen/Qwen2.5-VL

The core issue is that gemma-3-27b requires a very specific (often newer) version of the transformers library. Installing this specific version might break compatibility with vLLM itself or with other models (like certain Mistral or Qwen versions) which might rely on a different, possibly more stable or slightly older, transformers version that vLLM commonly supports.

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions