issues Search Results · repo:huggingface/transformers language:Python
Filter by
17k results
(85 ms)17k results
inhuggingface/transformers (press backspace or delete to remove)System Info
Irrelevant
Who can help?
@ArthurZucker
Documentation says:
The number of layers that use SWA (Sliding Window Attention). The bottom layers use SWA while the top use full
attention.
But ...
bug
norpadon
- 2
- Opened 4 hours ago
- #38787
System Info
transformers v4.52.3
The docs at https://huggingface.co/docs/transformers/accelerate show - image
However when working with accelerate in TrainingArgs I get the following issue of fsdp strategy ...
bug
PT-10
- 1
- Opened 9 hours ago
- #38776
Feature request
Hi from pytorch distributed! Thanks for showcasing pytorch APIs
device_map= auto and tp_plan= auto are somehow coupled right now:
https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4280-L4283 ...
Feature request
weifengpy
- 3
- Opened 17 hours ago
- #38771
Model description
Apple recently released FastVLM, a new vision-language model introduced at CVPR 2025, which significantly improves on
previous models in the LLaVA family.
The smallest FastVLM variant ...
New model
kamila-chay
- 2
- Opened 23 hours ago
- #38765
System Info
- transformers version: 4.49.0
- Platform: Linux-5.4.0-216-generic-x86_64-with-glibc2.31
- Python version: 3.13.2
- Huggingface_hub version: 0.29.2
- Safetensors version: 0.5.3
...
bug
VladPyzh
- 2
- Opened yesterday
- #38753
System Info
Before this PR 38288, the program will run smoothly even when we set output_attentions=True and the attn implementation
is not eager, as it will fallback to use eager mode, after this PR, ...
bug
kaixuanliu
- Opened yesterday
- #38750
System Info
When I set the infomerconfig.input_size = 1, I find a bug, but I don t know how to fix it.
- Function Name : create_network_inputs
time_feat = (
torch.cat(
...
bug
2004learner
- 6
- Opened yesterday
- #38745
Feature request
Implement handling for configurations where the q_lora_rank parameter is set to None.
Motivation
1. DeepSeek-V2-Lite model has q_lora_rank=None so we can support this model with this ...
Feature request
bzantium
- Opened yesterday
- #38742
Feature request
Have a section on Pruna AI within the documentation. We did a similar PR for diffusers and thought it would be nice to
show how to optimize transformers models too. .
Motivation
Have ...
Feature request
davidberenstein1957
- 2
- Opened yesterday
- #38740
System Info
For the current version (4.52.4), in the LlamaAttention class, the type hint for the forward function
https://github.com/huggingface/transformers/blob/aa798b7ac9ff5018b3578eb927dc438671ab6a3e/src/transformers/models/llama/modeling_llama.py#L231 ...
bug
nhatkhtn
- 7
- Opened yesterday
- #38739

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.