Pulse · huggingface/transformers · GitHub

March 16, 2025 – March 23, 2025

Overview

149 Active pull requests

84 Active issues

Could not load contribution data

Please try again later

3 Releases published by 1 person

v4.49.0-Gemma-3 Gemma 3 (Based on v4.49.0)
published Mar 18, 2025
v4.49.0-Mistral-3 Mistral 3 (Based on v4.49.0)
published Mar 18, 2025
v4.50.0 Release v4.50.0
published Mar 21, 2025

89 Pull requests merged by 54 people

Update installation.md
#36826 merged Mar 21, 2025
[docs] Model docs
#36469 merged Mar 21, 2025
Fix Pan and Scan on batched images Gemma3
#36864 merged Mar 21, 2025
Simplify keep_in_fp32_modules logic
#36722 merged Mar 21, 2025
fix: loss computation after embeddings resize - mllama
#36840 merged Mar 21, 2025
Fix: dtype cannot be str
#36262 merged Mar 21, 2025
Minor Gemma 3 fixes
#36884 merged Mar 21, 2025
Use deformable_detr kernel from the Hub
#36853 merged Mar 21, 2025
Gemma 3 tests expect greedy decoding
#36882 merged Mar 21, 2025
🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing
#36859 merged Mar 21, 2025
[generate] model defaults being inherited only happens for newer models
#36881 merged Mar 21, 2025
Revert "Update deprecated Jax calls (#35919)"
#36880 merged Mar 21, 2025
Make ViTPooler configurable
#36517 merged Mar 21, 2025
chore: fix typos in the tests directory
#36813 merged Mar 21, 2025
Remove call to .item in get_batch_samples
#36861 merged Mar 21, 2025
FIX FSDP plugin update for QLoRA
#36720 merged Mar 21, 2025
[CI] doc builder without custom image
#36862 merged Mar 21, 2025
Mllama: raise better error
#35934 merged Mar 21, 2025
Refactor Aya Vision with modular
#36688 merged Mar 20, 2025
Add support for seed in DataCollatorForLanguageModeling
#36497 merged Mar 20, 2025
[CI] fix update metadata job
#36850 merged Mar 20, 2025
Gemma3: fix test
#36820 merged Mar 20, 2025
[torchao] revert to get_apply_tensor_subclass
#36849 merged Mar 20, 2025
Add model visual debugger
#36798 merged Mar 20, 2025
Add Prompt Depth Anything Model
#35401 merged Mar 20, 2025
Refactor Attention implementation for ViT-based models
#36545 merged Mar 20, 2025
DeepSpeed tensor parallel+ZeRO
#36825 merged Mar 20, 2025
Support loading Quark quantized models in Transformers
#36372 merged Mar 20, 2025
Use pyupgrade --py39-plus to improve code
#36843 merged Mar 20, 2025
Fix hqq skipped modules and dynamic quant
#36821 merged Mar 20, 2025
Fix ONNX export for sequence classification head
#36332 merged Mar 20, 2025
Shieldgemma2
#36678 merged Mar 20, 2025
Fix: remove the redundant snippet of _whole_word_mask
#36759 merged Mar 20, 2025
Gemma 3: Adding explicit GenerationConfig and refactoring conversion …
#36833 merged Mar 20, 2025
Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh"
#36768 merged Mar 20, 2025
Update min safetensors bis
#36823 merged Mar 20, 2025
[generate] clarify docstrings: when to inherit GenerationMixin
#36605 merged Mar 20, 2025
[modular] Sort modular skips
#36304 merged Mar 20, 2025
Pass state dict
#35234 merged Mar 20, 2025
[qwen2 audio] remove redundant code and update docs
#36282 merged Mar 20, 2025
Update deprecated Jax calls
#35919 merged Mar 20, 2025
Fix fp16 ONNX export for RT-DETR and RT-DETRv2
#36460 merged Mar 20, 2025
Pass num_items_in_batch directly to loss computation
#36753 merged Mar 20, 2025
Saving Trainer.collator.tokenizer in when Trainer.processing_class is None
#36552 merged Mar 20, 2025
fix tiktoken convert to pass AddedToken to Tokenizer
#36566 merged Mar 20, 2025
[ForCausalLMLoss] allow users to pass shifted labels
#36607 merged Mar 20, 2025
Disable inductor config setter by default
#36608 merged Mar 20, 2025
Fix swanlab global step
#36728 merged Mar 20, 2025
rewrite main method in Qwen2, making it more clear
#36772 merged Mar 20, 2025
Move the warning to the documentation for DataCollatorWithFlattening
#36707 merged Mar 20, 2025
Remove our AdamW implementation
#36177 merged Mar 19, 2025
Update configuration_qwen2.py
#36735 merged Mar 19, 2025
quick fix fast_image_processor register error
#36716 merged Mar 19, 2025
Add Space to Bitsandbytes doc
#36834 merged Mar 19, 2025
Support tracable dynamicKVcache
#36311 merged Mar 19, 2025
One more fix for reviewer assignment
#36829 merged Mar 19, 2025
[gemma 3] multimodal checkpoints + AutoModelForCausalLM
#36741 merged Mar 19, 2025
enable OffloadedCache on XPU from PyTorch 2.7
#36654 merged Mar 19, 2025
Add option for ao base configs
#36526 merged Mar 19, 2025
Add attention visualization tool
#36630 merged Mar 19, 2025
[Generation] remove leftover code from end-to-end compilation
#36685 merged Mar 19, 2025
Fix Device map for bitsandbytes tests
#36800 merged Mar 19, 2025
Remove dist": "loadfile" for pytest for CircleCI jobs
#36811 merged Mar 19, 2025
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
#36572 merged Mar 19, 2025
Expectations test utils
#36569 merged Mar 18, 2025
[generate] ✨ vectorized beam search ✨
#35802 merged Mar 18, 2025
Support custom dosctrings in modular
#36726 merged Mar 18, 2025
Fix chameleon's TypeError because inputs_embeds may None
#36673 merged Mar 18, 2025
Fix casting dtype for qunatization
#36799 merged Mar 18, 2025
Fix Mistral3 tests
#36797 merged Mar 18, 2025
Loading optimizations
#36742 merged Mar 18, 2025
Update SHA for tj-actions/changed-files
#36795 merged Mar 18, 2025
fix hqq due to recent modeling changes
#36771 merged Mar 18, 2025
Add Mistral3
#36790 merged Mar 18, 2025
Fix gemma3_text tokenizer in mapping
#36793 merged Mar 18, 2025
Fixing typo in gemma3 image_processor_fast and adding a small test
#36776 merged Mar 18, 2025
chore: fix typos in tests directory
#36785 merged Mar 18, 2025
fix typos in the tests directory
#36717 merged Mar 17, 2025
doc: Clarify is_decoder usage in PretrainedConfig documentation
#36724 merged Mar 17, 2025
[docs] Update README
#36265 merged Mar 17, 2025
[CI] remove redundant checks in test_eager_matches_sdpa_inference
#36740 merged Mar 17, 2025
[MINOR:TYPO] Update hubert.md
#36733 merged Mar 17, 2025
Fix TrainingArguments.torch_empty_cache_steps post_init check
#36734 merged Mar 17, 2025
Fix test isolation for clear_import_cache utility
#36345 merged Mar 17, 2025
fix xpu tests
#36656 merged Mar 17, 2025
Allow ray datasets to be used with trainer
#36699 merged Mar 17, 2025
fix can_generate
#36570 merged Mar 17, 2025
enable/disable compile for quants methods
#36519 merged Mar 17, 2025
🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings
#36422 merged Mar 17, 2025

60 Pull requests opened by 49 people

Remove extra tensor clone in PyTorch code
#36748 opened Mar 16, 2025
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 opened Mar 16, 2025
Add Qwen2.5-Omni
#36752 opened Mar 16, 2025
🌐 [i18n-KO] Translated 'serving.md' to Korean
#36756 opened Mar 17, 2025
🌐 [i18n-KO] Translated `gpu_selection.md` to Korean
#36757 opened Mar 17, 2025
feat: expose the strict flag to allow catching missing model layers while loading a checkpoint
#36760 opened Mar 17, 2025
🌐 [i18n-KO] Translated `electra.md` to Korean
#36763 opened Mar 17, 2025
Add support for audios in apply_chat_template
#36770 opened Mar 17, 2025
Export for Phi4-mini
#36780 opened Mar 18, 2025
Use public export API on torch 2.5 and future
#36781 opened Mar 18, 2025
Fix attention_mask dimension issue in GPT2Model
#36782 opened Mar 18, 2025
Create modeling_ngen3.py for NGen3
#36787 opened Mar 18, 2025
Update configuration_auto.py for NGen3
#36791 opened Mar 18, 2025
Refactor `return_dict` logic to remove complicated if/else paths
#36794 opened Mar 18, 2025
[don't merge] check tokenizer ci job
#36796 opened Mar 18, 2025
Add Granite Speech Support
#36801 opened Mar 18, 2025
Add long vita
#36807 opened Mar 19, 2025
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 opened Mar 19, 2025
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 opened Mar 19, 2025
check tok
#36818 opened Mar 19, 2025
Dummies
#36827 opened Mar 19, 2025
[Modeling] Load FP8 safetensors such as DeepSeek
#36828 opened Mar 19, 2025
gemma3 fp16 fix
#36832 opened Mar 19, 2025
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 opened Mar 19, 2025
Remove unnecessary attr assignment
#36837 opened Mar 19, 2025
Move `return_dict` logic into `can_return_tuple` decorator
#36838 opened Mar 19, 2025
Haocheng lu
#36839 opened Mar 19, 2025
Fix Optional type annotation
#36841 opened Mar 20, 2025
fix pegasus init weights and other copied models
#36844 opened Mar 20, 2025
[Utils] torch version checks optionally accept dev versions
#36847 opened Mar 20, 2025
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 opened Mar 20, 2025
[2/N] Use pyupgrade --py39-plus to improve code
#36857 opened Mar 20, 2025
[DON'T MERGE] test doc builder
#36860 opened Mar 20, 2025
fix: prevent input side-effects in processor text args
#36866 opened Mar 20, 2025
Only count num items in batch when needed
#36867 opened Mar 20, 2025
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 opened Mar 20, 2025
[docs] Fix image link
#36869 opened Mar 20, 2025
Improve Model Download Speeds By ~3x For Large Models
#36870 opened Mar 21, 2025
Bump ruff to 0.11.1
#36871 opened Mar 21, 2025
Correct Condition for Pixel Values in Chameleon PR to Address Embedding and Token Mismatch
#36873 opened Mar 21, 2025
[Fix] Add `original_max_position_embeddings` to YARN rope_scaling optional keys
#36877 opened Mar 21, 2025
Adding Qwen3 and Qwen3MoE
#36878 opened Mar 21, 2025
Fix `resume_from_checkpoint` not recognising `"last-checkpoint"`
#36883 opened Mar 21, 2025
Optimize `to_py_obj` for python-native numeric lists and scalars
#36885 opened Mar 21, 2025
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 opened Mar 21, 2025
Allow easy registration of custom attention functions
#36889 opened Mar 21, 2025
Fix processor kwargs qwen2 vl
#36890 opened Mar 21, 2025
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0)
#36891 opened Mar 21, 2025
[WIP] Computer vision util: vision visualizer
#36892 opened Mar 21, 2025
fix Gemma3 Config
#36893 opened Mar 21, 2025
Enable tracing for Moshi
#36894 opened Mar 21, 2025
Add RF-DETR
#36895 opened Mar 21, 2025
tests: fix asyncio.wait() usage for python>=3.11
#36898 opened Mar 22, 2025
Adding ArlowGPT
#36899 opened Mar 22, 2025
Add NGen3
#36901 opened Mar 22, 2025
Added support for seed in `DataCollatorForWholeWordMask`
#36903 opened Mar 22, 2025
LogfireCallback: Integrating Logfire with Hugging Face’s Trainer
#36905 opened Mar 22, 2025
Fix torch version guard at import
#36907 opened Mar 22, 2025
fix cached file error when repo type is dataset
#36909 opened Mar 23, 2025
Fix typos
#36910 opened Mar 23, 2025

43 Issues closed by 16 people

Unable to export GLM models to ONNX
#35021 closed Mar 23, 2025
ModernBERT inference fails on CPU: ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
#35388 closed Mar 23, 2025
`modular_model_converter` can not handle objects import via try - except
#35414 closed Mar 23, 2025
`TFViTModel` and `interpolate_pos_encoding=True`
#36155 closed Mar 23, 2025
Inference with FSDP during training affects checkpoints
#34530 closed Mar 22, 2025
[BART] Cannot copy out of meta tensor; no data!
#36247 closed Mar 21, 2025
Bug introduced in `from_pretrained` `v4.48.3`..`v4.49.0`
#36258 closed Mar 21, 2025
<spam>
#36876 closed Mar 21, 2025
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 closed Mar 21, 2025
torch._subclasses.fake_tensor.DataDependentOutputException: aten._local_scalar_dense.default with `_prepare_4d_attention_mask_for_sdpa(
#36123 closed Mar 21, 2025
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed Mar 21, 2025
Allow setting a seed for DataCollatorForLanguageModeling
#36357 closed Mar 20, 2025
LlamaAttention has no attribute `rotary_emb` (4.50.0.dev0)
#36758 closed Mar 20, 2025
GPT2 repetition of words in output
#36848 closed Mar 20, 2025
num_items_in_batch unexpected in vision encoder decoder
#36744 closed Mar 20, 2025
Convert RT-DETR model to coreml
#35905 closed Mar 20, 2025
[bug] fast_image_processor register error
#36715 closed Mar 19, 2025
When what needs to be loaded is in the cache directory, there is no need to make a request to the remote
#36762 closed Mar 19, 2025
In the _speculative_sampling function, it seems that the "squeeze" method is being used incorrectly.
#36810 closed Mar 19, 2025
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
#36683 closed Mar 19, 2025
text-to-video_app
#36747 closed Mar 19, 2025
model from_pretrained bug in 4.50.dev0 in these days
#36506 closed Mar 19, 2025
Subtle difference with Pytorch AdamW?
#35504 closed Mar 19, 2025
Qwen2VL exhibits significant performance differences under different attention implementations.
#35749 closed Mar 19, 2025
[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer
#35973 closed Mar 19, 2025
Traning loss not showing with trainer
#36102 closed Mar 19, 2025
when model.generate with num_beams=2 and num_return_sequences=2,the output seqs are different from input_ids of stopping_criteria
#34574 closed Mar 18, 2025
`UnboundLocalError: cannot access local variable 'images_list'` when using Gemma 3 AutoProcessor with use_fast=True
#36739 closed Mar 18, 2025
Gemma3 minimal fine tuning example?
#36714 closed Mar 18, 2025
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35233 closed Mar 18, 2025
incorrect special_tokens_mask
#35897 closed Mar 18, 2025
Llama tokenizer newline character inconsistency
#35923 closed Mar 18, 2025
flex_attention does not output the full attention_weights with output_attention option
#36096 closed Mar 18, 2025
bug in save checkpoint
#36099 closed Mar 18, 2025
qwen2_5_vl processor padding side is wrong.
#36100 closed Mar 18, 2025
ValueError: weight is on the meta device, we need a `value` to put in on 0. `Gemma3`
#36766 closed Mar 17, 2025
Misleading documentation for `is_decoder` configuration parameter
#36482 closed Mar 17, 2025
Gemma2 (quantized) inference is broken - torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment.
#36485 closed Mar 17, 2025
On MoE implementation in HuggingFace
#36730 closed Mar 17, 2025
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed Mar 17, 2025
Cannot load siglip2 processor
#36665 closed Mar 16, 2025
SFTConfig.__init__() got an unexpected keyword argument 'optimizers'
#36749 closed Mar 16, 2025
Model.generate use_cache=True generates different results than use_cache=False
#36536 closed Mar 16, 2025

41 Issues opened by 40 people

`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING`
#36908 opened Mar 23, 2025
MacOs: register_pytree_node got an unexpected keyword argument 'flatten_with_keys_fn'
#36906 opened Mar 22, 2025
PixtralVisionModel does not support Flash Attention 2.0 yet
#36904 opened Mar 22, 2025
Warning: "No label_names provided for PeftModel" persists despite dataset containing "labels" column
#36902 opened Mar 22, 2025
groot n1
#36900 opened Mar 22, 2025
GPT2Model model output inconsistency between different transformers versions
#36897 opened Mar 22, 2025
Forced to hit `UserWarning` when generating with `temperature=0`
#36896 opened Mar 21, 2025
Issue with update
#36888 opened Mar 21, 2025
Florence2 stopped working after upgrade to 4.50.0 ("Unrecognized configuration class")
#36886 opened Mar 21, 2025
Add RF-DETR model
#36879 opened Mar 21, 2025
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 opened Mar 21, 2025
Support for SpatialLM series model
#36874 opened Mar 21, 2025
Optimize tokenizer.decode() Performance for `List[int]` Inputs
#36872 opened Mar 21, 2025
Multiple processor classes have input side-effects
#36865 opened Mar 20, 2025
Facing RunTime Attribute error while running different Flax models for RoFormer
#36854 opened Mar 20, 2025
Tansfomers_model
#36846 opened Mar 20, 2025
Unable to load google/siglip2-so400m-patch14-384/
#36845 opened Mar 20, 2025
GOT-OCR2 docs indicate model can produce markdown, but it only produces LaTeX.
#36836 opened Mar 19, 2025
Build for Windows and VS 2022 does not compile CUDA sources
#36830 opened Mar 19, 2025
Support for Ovis2 models
#36824 opened Mar 19, 2025
Gemma 3 is broken with fp16
#36822 opened Mar 19, 2025
Need Option to Disable Flash Attention in VideoLLaMA2.1-7B-AV (SiglipVisionModel)
#36819 opened Mar 19, 2025
Add EuroBert Model To Config
#36817 opened Mar 19, 2025
Gemma3 can't be fine-tuned on multi-image examples
#36816 opened Mar 19, 2025
Gemma3
#36815 opened Mar 19, 2025
When I use BF16 or FP16 to perform Lora fine-tuning on GemMA-3-12B-it, there will be an error when saving the checkpoint, but FP32 is normal
#36814 opened Mar 19, 2025
Not able to trace GPT2DoubleHeadsModel
#36812 opened Mar 19, 2025
Logic Errors in Image_processing_gemma3_fast.py
#36806 opened Mar 19, 2025
Qwen2VLForConditionalGeneration.from_pretrained() hangs with v0.50.0-dev0
#36803 opened Mar 18, 2025
BERT is broken on `v4.49.0-Gemma-3`
#36802 opened Mar 18, 2025
Design question for integrating new model to Transformers?
#36784 opened Mar 18, 2025
Throw messages in text-generation task with deepseek r1 with PEFTModel
#36783 opened Mar 18, 2025
Please support GGUF format for UMT5EncoderModel
#36774 opened Mar 17, 2025
Inconsistent Documentation for `⁠dataset_index` Requirement Across ViTPose Models
#36773 opened Mar 17, 2025
Add Audio inputs available in apply_chat_template
#36769 opened Mar 17, 2025
Source link to Ray Tune API outdated
#36765 opened Mar 17, 2025
could not parse ModelProto from /home/imss/zxhhhh/llama-3-8b/tokenizer.model
#36764 opened Mar 17, 2025
tj-actions/changed-files action compromised
#36761 opened Mar 17, 2025
Add Gemma 3 For Sequence Classification
#36755 opened Mar 16, 2025
Unable to load google/siglip2-base-patch16-naflex
#36754 opened Mar 16, 2025
IdeficsProcessor cannot handle multiple images in one text
#36751 opened Mar 16, 2025

130 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Janus model
#36053 commented on Mar 20, 2025 • 42 new comments
Samhq model addition
#35147 commented on Mar 22, 2025 • 27 new comments
Add StyleTTS 2
#35790 commented on Mar 19, 2025 • 27 new comments
add FlashAttentionKwargs and seq_idx to flat collator
#36456 commented on Mar 21, 2025 • 24 new comments
Add evolla rebase main
#36232 commented on Mar 21, 2025 • 18 new comments
Add InternVL (2.5 MPO)
#35968 commented on Mar 19, 2025 • 13 new comments
Add DeepSeek V2 Model into Transformers
#36400 commented on Mar 20, 2025 • 12 new comments
Add Doge model
#35891 commented on Mar 22, 2025 • 12 new comments
Add TimesFM Time Series Forecasting Model
#34082 commented on Mar 19, 2025 • 11 new comments
Introduce modular files for speech models
#35902 commented on Mar 21, 2025 • 10 new comments
Add MLCD model
#36182 commented on Mar 19, 2025 • 10 new comments
Add support for MiniMax's MiniMax-Text-01
#35831 commented on Mar 20, 2025 • 10 new comments
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 commented on Mar 22, 2025 • 9 new comments
Create and Expose SamVisionModel as public for better accessibility
#36493 commented on Mar 19, 2025 • 7 new comments
Support batch size > 1 image-text inference
#36682 commented on Mar 21, 2025 • 5 new comments
Add FAST
#35476 commented on Mar 21, 2025 • 4 new comments
Add internlm3 dense
#35694 commented on Mar 19, 2025 • 4 new comments
Support `return_tensors` in audio chat templates
#34601 commented on Mar 18, 2025 • 3 new comments
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 commented on Mar 20, 2025 • 3 new comments
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on Mar 21, 2025 • 3 new comments
Add index selection for `output_hidden_states`
#33705 commented on Mar 18, 2025 • 3 new comments
Add Distill Any Depth
#36614 commented on Mar 21, 2025 • 2 new comments
[Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!!
#36632 commented on Mar 20, 2025 • 2 new comments
Fix long lagging when streaming text without spaces and NJK chars
#36708 commented on Mar 19, 2025 • 2 new comments
Fix generation using flash-attention and static cache
#36729 commented on Mar 17, 2025 • 2 new comments
Integrate xlstm cleanly.
#35377 commented on Mar 18, 2025 • 1 new comment
Export T5 (encoder-decoder) to ExecuTorch
#36486 commented on Mar 18, 2025 • 1 new comment
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on Mar 22, 2025 • 1 new comment
Add support for DeepseekAI's DeepseekVL
#36248 commented on Mar 23, 2025 • 0 new comments
Pipeline: fix unnecessary warnings
#35753 commented on Mar 17, 2025 • 0 new comments
Fix the eval_use_gather_object flag usage
#36214 commented on Mar 18, 2025 • 0 new comments
fix: condition bos_token_id and space as token
#36211 commented on Mar 19, 2025 • 0 new comments
(ugly) Use `parallelism=4` for `check_repository_consistency`
#36197 commented on Mar 20, 2025 • 0 new comments
Flash Attention v3
#36190 commented on Mar 21, 2025 • 0 new comments
fix immediate quantization of the first token in QuantizedCache
#35760 commented on Mar 20, 2025 • 0 new comments
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on Mar 20, 2025 • 0 new comments
Remove head mask in generative models
#35786 commented on Mar 19, 2025 • 0 new comments
[WIP] add deepseek-v3
#35926 commented on Mar 20, 2025 • 0 new comments
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on Mar 21, 2025 • 0 new comments
Several fixes related to rotary position embeddings
#35901 commented on Mar 19, 2025 • 0 new comments
[WP] PagedAttention + Prefix Cache for FlashAttention2
#36737 commented on Mar 21, 2025 • 0 new comments
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 commented on Mar 22, 2025 • 0 new comments
Fix image processor speedup fixed
#36732 commented on Mar 17, 2025 • 0 new comments
Add CSM model
#36719 commented on Mar 21, 2025 • 0 new comments
fix whisper re-compile
#36712 commented on Mar 21, 2025 • 0 new comments
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on Mar 17, 2025 • 0 new comments
prune LM Head for USD
#36695 commented on Mar 19, 2025 • 0 new comments
don't pass NoneType for keep_in_fp32_modules
#36675 commented on Mar 20, 2025 • 0 new comments
Update quantizer_bnb_4bit.py
#36669 commented on Mar 20, 2025 • 0 new comments
[i18n-KO] Translated `keypoint_detection.md` to Korean
#36649 commented on Mar 17, 2025 • 0 new comments
Fix device issue in modeling_qwen2
#36647 commented on Mar 21, 2025 • 0 new comments
Refine parameter type annotations
#36644 commented on Mar 20, 2025 • 0 new comments
[WiP] Add Aimv2 model
#36625 commented on Mar 22, 2025 • 0 new comments
[WIP] Add support to load models with transforms
#36621 commented on Mar 23, 2025 • 0 new comments
Fixed 30s timestamp resets in Whisper long-form transcription
#36612 commented on Mar 20, 2025 • 0 new comments
Allow saving and loading multiple "raw" chat template files
#36588 commented on Mar 20, 2025 • 0 new comments
fix for loading gguf quantized model
#36563 commented on Mar 20, 2025 • 0 new comments
[Validation] First implementation of `@strict_dataclass` from `huggingface_hub`
#36534 commented on Mar 17, 2025 • 0 new comments
Add an event related to forward in the TrainerCallback
#36496 commented on Mar 18, 2025 • 0 new comments
Add NVIDIA Cosmos
#36476 commented on Mar 19, 2025 • 0 new comments
Customize docstrings fast image processor
#36466 commented on Mar 17, 2025 • 0 new comments
Add PlainDETR
#36437 commented on Mar 18, 2025 • 0 new comments
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on Mar 21, 2025 • 0 new comments
Add EfficientLoFTR model
#36355 commented on Mar 19, 2025 • 0 new comments
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default
#36307 commented on Mar 20, 2025 • 0 new comments
[`ModernBERT`] Never save 'reference_compile' config; should be set based on end user
#36305 commented on Mar 20, 2025 • 0 new comments
enable tp on CPU
#36299 commented on Mar 19, 2025 • 0 new comments
Update composition flag usage
#36263 commented on Mar 19, 2025 • 0 new comments
Add D-FINE Model into Transformers
#36261 commented on Mar 20, 2025 • 0 new comments
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 commented on Mar 20, 2025 • 0 new comments
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on Mar 20, 2025 • 0 new comments
Incompatibility in flash_attention_2 + Llama + Transformers>=4.43 + Autocast to fp16
#36224 commented on Mar 20, 2025 • 0 new comments
IsADirectoryError when training with tqdm enabled for trainer
#34766 commented on Mar 20, 2025 • 0 new comments
`AutoModelForCasualLM.from_pretrained()` exits without warning/error
#36245 commented on Mar 19, 2025 • 0 new comments
Marian RNN conversion support
#36651 commented on Mar 19, 2025 • 0 new comments
Stop output to stdout in streamers.py methods
#36562 commented on Mar 19, 2025 • 0 new comments
Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`
#35765 commented on Mar 19, 2025 • 0 new comments
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on Mar 19, 2025 • 0 new comments
Difficulties with multi-GPU Inferencing
#36634 commented on Mar 19, 2025 • 0 new comments
Cryptic error when using AutoTokenizer with SentencePiece tokenizers without sentencepiece installed
#36291 commented on Mar 19, 2025 • 0 new comments
Model Card to include key information (e.g. max_sequence_length, etc.)
#36743 commented on Mar 19, 2025 • 0 new comments
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#36738 commented on Mar 19, 2025 • 0 new comments
Error when tokenizer is set to string: `AttributeError: 'str' object has no attribute 'pad_token_id'`
#36731 commented on Mar 19, 2025 • 0 new comments
Custom 4D tensor caused shape mismatch error
#35290 commented on Mar 18, 2025 • 0 new comments
Gemma 3 1B - TypeError: 'NoneType' object is not callable
#36745 commented on Mar 18, 2025 • 0 new comments
Tensor size mismatch when trying to run RT-DETR on multiple gpus
#33165 commented on Mar 18, 2025 • 0 new comments
Issue with Progressive Generation Using inputs_embeds and past_key_values
#35707 commented on Mar 18, 2025 • 0 new comments
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 commented on Mar 18, 2025 • 0 new comments
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 commented on Mar 18, 2025 • 0 new comments
Token healing throws error with "Qwen/Qwen2.5-Coder-7B-Instruct"
#36210 commented on Mar 18, 2025 • 0 new comments
[bug] use_gather_object is not respected after the first eval in trainer
#36213 commented on Mar 18, 2025 • 0 new comments
Model trained with Flash Attention 2.0 raises "RuntimeError: query and key must have the same dtype" when generating
#30019 commented on Mar 18, 2025 • 0 new comments
lm_head parameters missing from named_parameters() in Qwen2.5-VL-3B-Instruct model
#36598 commented on Mar 17, 2025 • 0 new comments
SDPA `is_causal=False` has no effect due to `LlamaModel._prepare_4d_causal_attention_mask_with_cache_position`
#36150 commented on Mar 17, 2025 • 0 new comments
model.generate function is not compatible with custom position_ids
#36510 commented on Mar 17, 2025 • 0 new comments
MultiTask Classification and label_names on Trainer
#33193 commented on Mar 17, 2025 • 0 new comments
The parameter 'text' may be None as the comments says, there is a confuse.
#36667 commented on Mar 17, 2025 • 0 new comments
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on Mar 17, 2025 • 0 new comments
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 commented on Mar 17, 2025 • 0 new comments
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on Mar 16, 2025 • 0 new comments
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on Mar 20, 2025 • 0 new comments
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on Mar 19, 2025 • 0 new comments
Add Relation DETR
#34900 commented on Mar 20, 2025 • 0 new comments
Bye bye env vars, keep everything as configs
#34886 commented on Mar 19, 2025 • 0 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Mar 21, 2025 • 0 new comments
[`AutoDocstring`] Based on inspect parsing of the signature
#33771 commented on Mar 17, 2025 • 0 new comments
Trainer: add predict with generate
#32346 commented on Mar 21, 2025 • 0 new comments
Improve support for image generation with Chameleon & Anole
#32013 commented on Mar 19, 2025 • 0 new comments
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on Mar 23, 2025 • 0 new comments
tensor parallel training bug
#36296 commented on Mar 23, 2025 • 0 new comments
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on Mar 23, 2025 • 0 new comments
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on Mar 23, 2025 • 0 new comments
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on Mar 23, 2025 • 0 new comments
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 commented on Mar 22, 2025 • 0 new comments
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on Mar 22, 2025 • 0 new comments
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on Mar 22, 2025 • 0 new comments
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on Mar 22, 2025 • 0 new comments
Torch -> ONNX doesn't work after upgrading transformers to 4.49.0
#36276 commented on Mar 22, 2025 • 0 new comments
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on Mar 22, 2025 • 0 new comments
past_key_value(s) name inconsistency causing problems
#36290 commented on Mar 22, 2025 • 0 new comments
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on Mar 22, 2025 • 0 new comments
Add Magma from Microsoft to Transformers
#36629 commented on Mar 21, 2025 • 0 new comments
AutoModel from_pretrained does not recursively download relative imports
#36653 commented on Mar 21, 2025 • 0 new comments
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on Mar 21, 2025 • 0 new comments
Mask2Former _init_weights
#35877 commented on Mar 21, 2025 • 0 new comments
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 commented on Mar 21, 2025 • 0 new comments
Community contribution: Adding GGUF support for more architectures
#33260 commented on Mar 21, 2025 • 0 new comments
cannot import name 'is_timm_config_dict' from 'transformers.utils.generic'
#36068 commented on Mar 21, 2025 • 0 new comments
ValueError: Unrecognized image processor in Qwen/Qwen2.5-VL-3B-Instruct.
#36193 commented on Mar 21, 2025 • 0 new comments
modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
#36071 commented on Mar 20, 2025 • 0 new comments