[Bug]: Qwen2.5: Sliding window for some but all layers is not supported. This model uses sliding window but `max_window_layers` = 28 is less than `num_hidden_layers` = 28. Please open an issue to discuss this feature.

### Your current environment

For some reason, I can't run `collect_env.py` in my env. Sorry about that. :) But I'm sure this problem has nothing to do with the environment.

My envinronment:

vllm: 0.7.3
cuda: 12.4
transformers: 4.50.1
trl: 0.15.2


### 🐛 Describe the bug

My code using vllm:

```python
    llm = LLM(model=model_name_or_path,
              dtype="float16",
              tensor_parallel_size=tensor_parallel_size,
              max_num_seqs=batch_size,
              max_model_len=None if max_model_len == -1 else max_model_len,
              gpu_memory_utilization=0.9)
    sampling_params = SamplingParams(temperature=temperature, max_tokens=max_tokens)

    outputs = llm.generate(prompts, sampling_params)
```


When I want to generate response from `Qwen2.5-7B-Instruct`, I encounter `ValueError` raised by [this line](https://github.com/vllm-project/vllm/blob/fd5fd2690275e90865023a0bcac0047ecb3f3897/vllm/model_executor/models/qwen2.py#L274): 

```
Sliding window for some but all layers is not supported. This model uses sliding window but `max_window_layers` = 28 is less than `num_hidden_layers` = 28. Please open an issue to discuss this feature.
```

The model I used is fine-tuned using `trl` library and flash-attention 2, with sliding window enabled.

Looks like there is a `TODO` tag on [this line](https://github.com/vllm-project/vllm/blob/fd5fd2690275e90865023a0bcac0047ecb3f3897/vllm/model_executor/models/qwen2.py#L274), does it make sense?

I'm curious why when I trained it with `trl` and `vllm`, all works fine, but when I want to predict with the fine-tuend model, the vllm throws this `ValueError`?

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Qwen2.5: Sliding window for some but all layers is not supported. This model uses sliding window but `max_window_layers` = 28 is less than `num_hidden_layers` = 28. Please open an issue to discuss this feature. #15705

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen2.5: Sliding window for some but all layers is not supported. This model uses sliding window but max_window_layers = 28 is less than num_hidden_layers = 28. Please open an issue to discuss this feature. #15705

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Qwen2.5: Sliding window for some but all layers is not supported. This model uses sliding window but `max_window_layers` = 28 is less than `num_hidden_layers` = 28. Please open an issue to discuss this feature. #15705