Issue search results

Filter by

17k results

(85 ms)inhuggingface/transformers (press backspace or delete to remove)

huggingface/transformers
Qwen3 Incorrect order of sliding window layers

System Info Irrelevant Who can help? @ArthurZucker Documentation says: The number of layers that use SWA (Sliding Window Attention). The bottom layers use SWA while the top use full attention. But ...

bug

norpadon

Opened
4 hours ago

#38787

huggingface/transformers
[Bug][DOCS] TrainingArguments.init() got an unexpected keyword argument 'fsdp_strategy'

System Info transformers v4.52.3 The docs at https://huggingface.co/docs/transformers/accelerate show - image However when working with accelerate in TrainingArgs I get the following issue of fsdp strategy ...

bug

PT-10

Opened
9 hours ago

#38776

huggingface/transformers
device_map='auto' coupled with tp_plan='auto'

Feature request Hi from pytorch distributed! Thanks for showcasing pytorch APIs device_map= auto and tp_plan= auto are somehow coupled right now: https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4280-L4283 ...

Feature request

weifengpy

Opened
17 hours ago

#38771

huggingface/transformers
Add FastVLM from CVPR 2025

Model description Apple recently released FastVLM, a new vision-language model introduced at CVPR 2025, which significantly improves on previous models in the LLaVA family. The smallest FastVLM variant ...

New model

kamila-chay

Opened
23 hours ago

#38765

huggingface/transformers
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer

System Info - transformers version: 4.49.0 - Platform: Linux-5.4.0-216-generic-x86_64-with-glibc2.31 - Python version: 3.13.2 - Huggingface_hub version: 0.29.2 - Safetensors version: 0.5.3 ...

bug

VladPyzh

Opened
yesterday

#38753

huggingface/transformers
Is it a good choice to early error when `output_attentions=True` and attn implementation not equal to `eager`

System Info Before this PR 38288, the program will run smoothly even when we set output_attentions=True and the attn implementation is not eager, as it will fallback to use eager mode, after this PR, ...

bug

kaixuanliu

Opened
yesterday

#38750

huggingface/transformers
[Bug][InformerForPredict] The shape will cause a problem

System Info When I set the infomerconfig.input_size = 1, I find a bug, but I don t know how to fix it. - Function Name : create_network_inputs time_feat = ( torch.cat( ...

bug

2004learner

Opened
yesterday

#38745

huggingface/transformers
[DeepSeek-V3] take care of the case `q_lora_rank is None`

Feature request Implement handling for configurations where the q_lora_rank parameter is set to None. Motivation 1. DeepSeek-V2-Lite model has q_lora_rank=None so we can support this model with this ...

Feature request

bzantium

Opened
yesterday

#38742

huggingface/transformers
[DOCS] Add `pruna` as optimization framework

Feature request Have a section on Pruna AI within the documentation. We did a similar PR for diffusers and thought it would be nice to show how to optimize transformers models too. . Motivation Have ...

Feature request

davidberenstein1957

Opened
yesterday

#38740

huggingface/transformers
LlamaAttention forward function type hint is incorrect

System Info For the current version (4.52.4), in the LlamaAttention class, the type hint for the forward function https://github.com/huggingface/transformers/blob/aa798b7ac9ff5018b3578eb927dc438671ab6a3e/src/transformers/models/llama/modeling_llama.py#L231 ...

bug

nhatkhtn

Opened
yesterday

#38739

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

huggingface/transformers
Qwen3 Incorrect order of sliding window layers

huggingface/transformers
[Bug][DOCS] TrainingArguments.init() got an unexpected keyword argument 'fsdp_strategy'

huggingface/transformers
device_map='auto' coupled with tp_plan='auto'

huggingface/transformers
Add FastVLM from CVPR 2025

huggingface/transformers
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer

huggingface/transformers
Is it a good choice to early error when `output_attentions=True` and attn implementation not equal to `eager`

huggingface/transformers
[Bug][InformerForPredict] The shape will cause a problem

huggingface/transformers
[DeepSeek-V3] take care of the case `q_lora_rank is None`

huggingface/transformers
[DOCS] Add `pruna` as optimization framework

huggingface/transformers
LlamaAttention forward function type hint is incorrect

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:huggingface/transformers language:Python

Filter by

State

Advanced

17k results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.