Skip to content

Conversation

hmellor
Copy link
Member

@hmellor hmellor commented Sep 18, 2025

Adds support for encoder models to the Transformers backend.

Depends on

I have conditioned this feature on the Transformers version, so as soon as the following dependency is merged we can merge this PR and users installing vLLM and Transformers from main can use this feature:

Changes

  • Use EncoderOnlyAttention if an encoder model is detected
  • Skip position_ids buffers if they're found in the checkpoint because vLLM always passes position_ids anyway

Testing

python examples/offline_inference/basic/embed.py --model BAAI/bge-base-en-v1.5 --model-impl transformers
pytest tests/models/test_transformers.py -k test_embed_correctness

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor marked this pull request as draft September 18, 2025 14:06
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for encoder models to the Transformers backend. The main changes include detecting encoder models by checking for is_causal=False and using EncoderOnlyAttention accordingly. It also adds logic to skip loading position_ids from checkpoints for encoder models, as vLLM handles this. The changes look good, but I have a critical suggestion to improve the type hint for create_attention_instances to reflect that it can return EncoderOnlyAttention instances, which will improve code clarity and maintainability.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor requested a review from Isotr0py September 18, 2025 15:05
@hmellor hmellor marked this pull request as ready for review September 18, 2025 15:05
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 18, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit. Otherwise LGTM!

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor merged commit 12aed7e into vllm-project:main Sep 19, 2025
50 checks passed
@hmellor hmellor deleted the transformers-backend-encoders branch September 19, 2025 18:15
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor moved this to Done in Transformers backend Sep 24, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: charlifu <charlifu@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants