-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
3 Releases published by 1 person
-
v4.49.0-Gemma-3 Gemma 3 (Based on v4.49.0)
published
Mar 18, 2025 -
v4.49.0-Mistral-3 Mistral 3 (Based on v4.49.0)
published
Mar 18, 2025 -
v4.50.0 Release v4.50.0
published
Mar 21, 2025
89 Pull requests merged by 54 people
-
Update installation.md
#36826 merged
Mar 21, 2025 -
[docs] Model docs
#36469 merged
Mar 21, 2025 -
Fix Pan and Scan on batched images Gemma3
#36864 merged
Mar 21, 2025 -
Simplify keep_in_fp32_modules logic
#36722 merged
Mar 21, 2025 -
fix: loss computation after embeddings resize - mllama
#36840 merged
Mar 21, 2025 -
Fix: dtype cannot be str
#36262 merged
Mar 21, 2025 -
Minor Gemma 3 fixes
#36884 merged
Mar 21, 2025 -
Use
deformable_detr
kernel from the Hub#36853 merged
Mar 21, 2025 -
Gemma 3 tests expect greedy decoding
#36882 merged
Mar 21, 2025 -
🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing
#36859 merged
Mar 21, 2025 -
[generate] model defaults being inherited only happens for newer models
#36881 merged
Mar 21, 2025 -
Revert "Update deprecated Jax calls (#35919)"
#36880 merged
Mar 21, 2025 -
Make ViTPooler configurable
#36517 merged
Mar 21, 2025 -
chore: fix typos in the tests directory
#36813 merged
Mar 21, 2025 -
Remove call to
.item
inget_batch_samples
#36861 merged
Mar 21, 2025 -
FIX FSDP plugin update for QLoRA
#36720 merged
Mar 21, 2025 -
[CI] doc builder without custom image
#36862 merged
Mar 21, 2025 -
Mllama: raise better error
#35934 merged
Mar 21, 2025 -
Refactor Aya Vision with modular
#36688 merged
Mar 20, 2025 -
Add support for seed in
DataCollatorForLanguageModeling
#36497 merged
Mar 20, 2025 -
[CI] fix update metadata job
#36850 merged
Mar 20, 2025 -
Gemma3: fix test
#36820 merged
Mar 20, 2025 -
[torchao] revert to get_apply_tensor_subclass
#36849 merged
Mar 20, 2025 -
Add model visual debugger
#36798 merged
Mar 20, 2025 -
Add Prompt Depth Anything Model
#35401 merged
Mar 20, 2025 -
Refactor Attention implementation for ViT-based models
#36545 merged
Mar 20, 2025 -
DeepSpeed tensor parallel+ZeRO
#36825 merged
Mar 20, 2025 -
Support loading Quark quantized models in Transformers
#36372 merged
Mar 20, 2025 -
Use pyupgrade --py39-plus to improve code
#36843 merged
Mar 20, 2025 -
Fix hqq skipped modules and dynamic quant
#36821 merged
Mar 20, 2025 -
Fix ONNX export for sequence classification head
#36332 merged
Mar 20, 2025 -
Shieldgemma2
#36678 merged
Mar 20, 2025 -
Fix: remove the redundant snippet of _whole_word_mask
#36759 merged
Mar 20, 2025 -
Gemma 3: Adding explicit GenerationConfig and refactoring conversion …
#36833 merged
Mar 20, 2025 -
Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh"
#36768 merged
Mar 20, 2025 -
Update min safetensors bis
#36823 merged
Mar 20, 2025 -
[generate] clarify docstrings: when to inherit
GenerationMixin
#36605 merged
Mar 20, 2025 -
[modular] Sort modular skips
#36304 merged
Mar 20, 2025 -
Pass state dict
#35234 merged
Mar 20, 2025 -
[qwen2 audio] remove redundant code and update docs
#36282 merged
Mar 20, 2025 -
Update deprecated Jax calls
#35919 merged
Mar 20, 2025 -
Fix fp16 ONNX export for RT-DETR and RT-DETRv2
#36460 merged
Mar 20, 2025 -
Pass num_items_in_batch directly to loss computation
#36753 merged
Mar 20, 2025 -
Saving
Trainer.collator.tokenizer
in whenTrainer.processing_class
isNone
#36552 merged
Mar 20, 2025 -
fix tiktoken convert to pass AddedToken to Tokenizer
#36566 merged
Mar 20, 2025 -
[ForCausalLMLoss] allow users to pass shifted labels
#36607 merged
Mar 20, 2025 -
Disable inductor config setter by default
#36608 merged
Mar 20, 2025 -
Fix swanlab global step
#36728 merged
Mar 20, 2025 -
rewrite main method in Qwen2, making it more clear
#36772 merged
Mar 20, 2025 -
Move the warning to the documentation for DataCollatorWithFlattening
#36707 merged
Mar 20, 2025 -
Remove our AdamW implementation
#36177 merged
Mar 19, 2025 -
Update configuration_qwen2.py
#36735 merged
Mar 19, 2025 -
quick fix fast_image_processor register error
#36716 merged
Mar 19, 2025 -
Add Space to Bitsandbytes doc
#36834 merged
Mar 19, 2025 -
Support tracable dynamicKVcache
#36311 merged
Mar 19, 2025 -
One more fix for reviewer assignment
#36829 merged
Mar 19, 2025 -
[gemma 3] multimodal checkpoints + AutoModelForCausalLM
#36741 merged
Mar 19, 2025 -
enable OffloadedCache on XPU from PyTorch 2.7
#36654 merged
Mar 19, 2025 -
Add option for ao base configs
#36526 merged
Mar 19, 2025 -
Add attention visualization tool
#36630 merged
Mar 19, 2025 -
[Generation] remove leftover code from end-to-end compilation
#36685 merged
Mar 19, 2025 -
Fix Device map for bitsandbytes tests
#36800 merged
Mar 19, 2025 -
Remove
dist": "loadfile"
forpytest
for CircleCI jobs#36811 merged
Mar 19, 2025 -
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
#36572 merged
Mar 19, 2025 -
Expectations test utils
#36569 merged
Mar 18, 2025 -
[generate] ✨ vectorized beam search ✨
#35802 merged
Mar 18, 2025 -
Support custom dosctrings in modular
#36726 merged
Mar 18, 2025 -
Fix chameleon's TypeError because inputs_embeds may None
#36673 merged
Mar 18, 2025 -
Fix casting dtype for qunatization
#36799 merged
Mar 18, 2025 -
Fix Mistral3 tests
#36797 merged
Mar 18, 2025 -
Loading optimizations
#36742 merged
Mar 18, 2025 -
Update SHA for
tj-actions/changed-files
#36795 merged
Mar 18, 2025 -
fix hqq due to recent modeling changes
#36771 merged
Mar 18, 2025 -
Add Mistral3
#36790 merged
Mar 18, 2025 -
Fix gemma3_text tokenizer in mapping
#36793 merged
Mar 18, 2025 -
Fixing typo in gemma3 image_processor_fast and adding a small test
#36776 merged
Mar 18, 2025 -
chore: fix typos in tests directory
#36785 merged
Mar 18, 2025 -
fix typos in the tests directory
#36717 merged
Mar 17, 2025 -
doc: Clarify
is_decoder
usage in PretrainedConfig documentation#36724 merged
Mar 17, 2025 -
[docs] Update README
#36265 merged
Mar 17, 2025 -
[CI] remove redundant checks in
test_eager_matches_sdpa_inference
#36740 merged
Mar 17, 2025 -
[MINOR:TYPO] Update hubert.md
#36733 merged
Mar 17, 2025 -
Fix
TrainingArguments.torch_empty_cache_steps
post_init check#36734 merged
Mar 17, 2025 -
Fix test isolation for clear_import_cache utility
#36345 merged
Mar 17, 2025 -
fix xpu tests
#36656 merged
Mar 17, 2025 -
Allow ray datasets to be used with trainer
#36699 merged
Mar 17, 2025 -
fix can_generate
#36570 merged
Mar 17, 2025 -
enable/disable compile for quants methods
#36519 merged
Mar 17, 2025 -
🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings
#36422 merged
Mar 17, 2025
60 Pull requests opened by 49 people
-
Remove extra tensor clone in PyTorch code
#36748 opened
Mar 16, 2025 -
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 opened
Mar 16, 2025 -
Add Qwen2.5-Omni
#36752 opened
Mar 16, 2025 -
🌐 [i18n-KO] Translated 'serving.md' to Korean
#36756 opened
Mar 17, 2025 -
🌐 [i18n-KO] Translated `gpu_selection.md` to Korean
#36757 opened
Mar 17, 2025 -
feat: expose the strict flag to allow catching missing model layers while loading a checkpoint
#36760 opened
Mar 17, 2025 -
🌐 [i18n-KO] Translated `electra.md` to Korean
#36763 opened
Mar 17, 2025 -
Add support for audios in apply_chat_template
#36770 opened
Mar 17, 2025 -
Export for Phi4-mini
#36780 opened
Mar 18, 2025 -
Use public export API on torch 2.5 and future
#36781 opened
Mar 18, 2025 -
Fix attention_mask dimension issue in GPT2Model
#36782 opened
Mar 18, 2025 -
Create modeling_ngen3.py for NGen3
#36787 opened
Mar 18, 2025 -
Update configuration_auto.py for NGen3
#36791 opened
Mar 18, 2025 -
Refactor `return_dict` logic to remove complicated if/else paths
#36794 opened
Mar 18, 2025 -
[don't merge] check tokenizer ci job
#36796 opened
Mar 18, 2025 -
Add Granite Speech Support
#36801 opened
Mar 18, 2025 -
Add long vita
#36807 opened
Mar 19, 2025 -
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 opened
Mar 19, 2025 -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 opened
Mar 19, 2025 -
check tok
#36818 opened
Mar 19, 2025 -
Dummies
#36827 opened
Mar 19, 2025 -
[Modeling] Load FP8 safetensors such as DeepSeek
#36828 opened
Mar 19, 2025 -
gemma3 fp16 fix
#36832 opened
Mar 19, 2025 -
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 opened
Mar 19, 2025 -
Remove unnecessary attr assignment
#36837 opened
Mar 19, 2025 -
Move `return_dict` logic into `can_return_tuple` decorator
#36838 opened
Mar 19, 2025 -
Haocheng lu
#36839 opened
Mar 19, 2025 -
Fix Optional type annotation
#36841 opened
Mar 20, 2025 -
fix pegasus init weights and other copied models
#36844 opened
Mar 20, 2025 -
[Utils] torch version checks optionally accept dev versions
#36847 opened
Mar 20, 2025 -
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 opened
Mar 20, 2025 -
[2/N] Use pyupgrade --py39-plus to improve code
#36857 opened
Mar 20, 2025 -
[DON'T MERGE] test doc builder
#36860 opened
Mar 20, 2025 -
fix: prevent input side-effects in processor text args
#36866 opened
Mar 20, 2025 -
Only count num items in batch when needed
#36867 opened
Mar 20, 2025 -
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 opened
Mar 20, 2025 -
[docs] Fix image link
#36869 opened
Mar 20, 2025 -
Improve Model Download Speeds By ~3x For Large Models
#36870 opened
Mar 21, 2025 -
Bump ruff to 0.11.1
#36871 opened
Mar 21, 2025 -
Correct Condition for Pixel Values in Chameleon PR to Address Embedding and Token Mismatch
#36873 opened
Mar 21, 2025 -
[Fix] Add `original_max_position_embeddings` to YARN rope_scaling optional keys
#36877 opened
Mar 21, 2025 -
Adding Qwen3 and Qwen3MoE
#36878 opened
Mar 21, 2025 -
Fix `resume_from_checkpoint` not recognising `"last-checkpoint"`
#36883 opened
Mar 21, 2025 -
Optimize `to_py_obj` for python-native numeric lists and scalars
#36885 opened
Mar 21, 2025 -
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 opened
Mar 21, 2025 -
Allow easy registration of custom attention functions
#36889 opened
Mar 21, 2025 -
Fix processor kwargs qwen2 vl
#36890 opened
Mar 21, 2025 -
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0)
#36891 opened
Mar 21, 2025 -
[WIP] Computer vision util: vision visualizer
#36892 opened
Mar 21, 2025 -
fix Gemma3 Config
#36893 opened
Mar 21, 2025 -
Enable tracing for Moshi
#36894 opened
Mar 21, 2025 -
Add RF-DETR
#36895 opened
Mar 21, 2025 -
tests: fix asyncio.wait() usage for python>=3.11
#36898 opened
Mar 22, 2025 -
Adding ArlowGPT
#36899 opened
Mar 22, 2025 -
Add NGen3
#36901 opened
Mar 22, 2025 -
Added support for seed in `DataCollatorForWholeWordMask`
#36903 opened
Mar 22, 2025 -
LogfireCallback: Integrating Logfire with Hugging Face’s Trainer
#36905 opened
Mar 22, 2025 -
Fix torch version guard at import
#36907 opened
Mar 22, 2025 -
fix cached file error when repo type is dataset
#36909 opened
Mar 23, 2025 -
Fix typos
#36910 opened
Mar 23, 2025
43 Issues closed by 16 people
-
Unable to export GLM models to ONNX
#35021 closed
Mar 23, 2025 -
`modular_model_converter` can not handle objects import via try - except
#35414 closed
Mar 23, 2025 -
`TFViTModel` and `interpolate_pos_encoding=True`
#36155 closed
Mar 23, 2025 -
Inference with FSDP during training affects checkpoints
#34530 closed
Mar 22, 2025 -
[BART] Cannot copy out of meta tensor; no data!
#36247 closed
Mar 21, 2025 -
Bug introduced in `from_pretrained` `v4.48.3`..`v4.49.0`
#36258 closed
Mar 21, 2025 -
<spam>
#36876 closed
Mar 21, 2025 -
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 closed
Mar 21, 2025 -
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed
Mar 21, 2025 -
Allow setting a seed for DataCollatorForLanguageModeling
#36357 closed
Mar 20, 2025 -
LlamaAttention has no attribute `rotary_emb` (4.50.0.dev0)
#36758 closed
Mar 20, 2025 -
GPT2 repetition of words in output
#36848 closed
Mar 20, 2025 -
num_items_in_batch unexpected in vision encoder decoder
#36744 closed
Mar 20, 2025 -
Convert RT-DETR model to coreml
#35905 closed
Mar 20, 2025 -
[bug] fast_image_processor register error
#36715 closed
Mar 19, 2025 -
When what needs to be loaded is in the cache directory, there is no need to make a request to the remote
#36762 closed
Mar 19, 2025 -
In the _speculative_sampling function, it seems that the "squeeze" method is being used incorrectly.
#36810 closed
Mar 19, 2025 -
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
#36683 closed
Mar 19, 2025 -
text-to-video_app
#36747 closed
Mar 19, 2025 -
model from_pretrained bug in 4.50.dev0 in these days
#36506 closed
Mar 19, 2025 -
Subtle difference with Pytorch AdamW?
#35504 closed
Mar 19, 2025 -
Qwen2VL exhibits significant performance differences under different attention implementations.
#35749 closed
Mar 19, 2025 -
[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer
#35973 closed
Mar 19, 2025 -
Traning loss not showing with trainer
#36102 closed
Mar 19, 2025 -
Gemma3 minimal fine tuning example?
#36714 closed
Mar 18, 2025 -
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35233 closed
Mar 18, 2025 -
incorrect special_tokens_mask
#35897 closed
Mar 18, 2025 -
Llama tokenizer newline character inconsistency
#35923 closed
Mar 18, 2025 -
flex_attention does not output the full attention_weights with output_attention option
#36096 closed
Mar 18, 2025 -
bug in save checkpoint
#36099 closed
Mar 18, 2025 -
qwen2_5_vl processor padding side is wrong.
#36100 closed
Mar 18, 2025 -
ValueError: weight is on the meta device, we need a `value` to put in on 0. `Gemma3`
#36766 closed
Mar 17, 2025 -
Misleading documentation for `is_decoder` configuration parameter
#36482 closed
Mar 17, 2025 -
On MoE implementation in HuggingFace
#36730 closed
Mar 17, 2025 -
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed
Mar 17, 2025 -
Cannot load siglip2 processor
#36665 closed
Mar 16, 2025 -
SFTConfig.__init__() got an unexpected keyword argument 'optimizers'
#36749 closed
Mar 16, 2025 -
Model.generate use_cache=True generates different results than use_cache=False
#36536 closed
Mar 16, 2025
41 Issues opened by 40 people
-
`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING`
#36908 opened
Mar 23, 2025 -
MacOs: register_pytree_node got an unexpected keyword argument 'flatten_with_keys_fn'
#36906 opened
Mar 22, 2025 -
PixtralVisionModel does not support Flash Attention 2.0 yet
#36904 opened
Mar 22, 2025 -
Warning: "No label_names provided for PeftModel" persists despite dataset containing "labels" column
#36902 opened
Mar 22, 2025 -
groot n1
#36900 opened
Mar 22, 2025 -
GPT2Model model output inconsistency between different transformers versions
#36897 opened
Mar 22, 2025 -
Forced to hit `UserWarning` when generating with `temperature=0`
#36896 opened
Mar 21, 2025 -
Issue with update
#36888 opened
Mar 21, 2025 -
Florence2 stopped working after upgrade to 4.50.0 ("Unrecognized configuration class")
#36886 opened
Mar 21, 2025 -
Add RF-DETR model
#36879 opened
Mar 21, 2025 -
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 opened
Mar 21, 2025 -
Support for SpatialLM series model
#36874 opened
Mar 21, 2025 -
Optimize tokenizer.decode() Performance for `List[int]` Inputs
#36872 opened
Mar 21, 2025 -
Multiple processor classes have input side-effects
#36865 opened
Mar 20, 2025 -
Facing RunTime Attribute error while running different Flax models for RoFormer
#36854 opened
Mar 20, 2025 -
Tansfomers_model
#36846 opened
Mar 20, 2025 -
Unable to load google/siglip2-so400m-patch14-384/
#36845 opened
Mar 20, 2025 -
GOT-OCR2 docs indicate model can produce markdown, but it only produces LaTeX.
#36836 opened
Mar 19, 2025 -
Build for Windows and VS 2022 does not compile CUDA sources
#36830 opened
Mar 19, 2025 -
Support for Ovis2 models
#36824 opened
Mar 19, 2025 -
Gemma 3 is broken with fp16
#36822 opened
Mar 19, 2025 -
Need Option to Disable Flash Attention in VideoLLaMA2.1-7B-AV (SiglipVisionModel)
#36819 opened
Mar 19, 2025 -
Add EuroBert Model To Config
#36817 opened
Mar 19, 2025 -
Gemma3 can't be fine-tuned on multi-image examples
#36816 opened
Mar 19, 2025 -
Gemma3
#36815 opened
Mar 19, 2025 -
Not able to trace GPT2DoubleHeadsModel
#36812 opened
Mar 19, 2025 -
Logic Errors in Image_processing_gemma3_fast.py
#36806 opened
Mar 19, 2025 -
Qwen2VLForConditionalGeneration.from_pretrained() hangs with v0.50.0-dev0
#36803 opened
Mar 18, 2025 -
BERT is broken on `v4.49.0-Gemma-3`
#36802 opened
Mar 18, 2025 -
Design question for integrating new model to Transformers?
#36784 opened
Mar 18, 2025 -
Throw messages in text-generation task with deepseek r1 with PEFTModel
#36783 opened
Mar 18, 2025 -
Please support GGUF format for UMT5EncoderModel
#36774 opened
Mar 17, 2025 -
Inconsistent Documentation for `dataset_index` Requirement Across ViTPose Models
#36773 opened
Mar 17, 2025 -
Add Audio inputs available in apply_chat_template
#36769 opened
Mar 17, 2025 -
Source link to Ray Tune API outdated
#36765 opened
Mar 17, 2025 -
could not parse ModelProto from /home/imss/zxhhhh/llama-3-8b/tokenizer.model
#36764 opened
Mar 17, 2025 -
tj-actions/changed-files action compromised
#36761 opened
Mar 17, 2025 -
Add Gemma 3 For Sequence Classification
#36755 opened
Mar 16, 2025 -
Unable to load google/siglip2-base-patch16-naflex
#36754 opened
Mar 16, 2025 -
IdeficsProcessor cannot handle multiple images in one text
#36751 opened
Mar 16, 2025
130 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Janus model
#36053 commented on
Mar 20, 2025 • 42 new comments -
Samhq model addition
#35147 commented on
Mar 22, 2025 • 27 new comments -
Add StyleTTS 2
#35790 commented on
Mar 19, 2025 • 27 new comments -
add FlashAttentionKwargs and seq_idx to flat collator
#36456 commented on
Mar 21, 2025 • 24 new comments -
Add evolla rebase main
#36232 commented on
Mar 21, 2025 • 18 new comments -
Add InternVL (2.5 MPO)
#35968 commented on
Mar 19, 2025 • 13 new comments -
Add DeepSeek V2 Model into Transformers
#36400 commented on
Mar 20, 2025 • 12 new comments -
Add Doge model
#35891 commented on
Mar 22, 2025 • 12 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Mar 19, 2025 • 11 new comments -
Introduce modular files for speech models
#35902 commented on
Mar 21, 2025 • 10 new comments -
Add MLCD model
#36182 commented on
Mar 19, 2025 • 10 new comments -
Add support for MiniMax's MiniMax-Text-01
#35831 commented on
Mar 20, 2025 • 10 new comments -
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 commented on
Mar 22, 2025 • 9 new comments -
Create and Expose SamVisionModel as public for better accessibility
#36493 commented on
Mar 19, 2025 • 7 new comments -
Support batch size > 1 image-text inference
#36682 commented on
Mar 21, 2025 • 5 new comments -
Add FAST
#35476 commented on
Mar 21, 2025 • 4 new comments -
Add internlm3 dense
#35694 commented on
Mar 19, 2025 • 4 new comments -
Support `return_tensors` in audio chat templates
#34601 commented on
Mar 18, 2025 • 3 new comments -
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 commented on
Mar 20, 2025 • 3 new comments -
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on
Mar 21, 2025 • 3 new comments -
Add index selection for `output_hidden_states`
#33705 commented on
Mar 18, 2025 • 3 new comments -
Add Distill Any Depth
#36614 commented on
Mar 21, 2025 • 2 new comments -
[Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!!
#36632 commented on
Mar 20, 2025 • 2 new comments -
Fix long lagging when streaming text without spaces and NJK chars
#36708 commented on
Mar 19, 2025 • 2 new comments -
Fix generation using flash-attention and static cache
#36729 commented on
Mar 17, 2025 • 2 new comments -
Integrate xlstm cleanly.
#35377 commented on
Mar 18, 2025 • 1 new comment -
Export T5 (encoder-decoder) to ExecuTorch
#36486 commented on
Mar 18, 2025 • 1 new comment -
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on
Mar 22, 2025 • 1 new comment -
Add support for DeepseekAI's DeepseekVL
#36248 commented on
Mar 23, 2025 • 0 new comments -
Pipeline: fix unnecessary warnings
#35753 commented on
Mar 17, 2025 • 0 new comments -
Fix the eval_use_gather_object flag usage
#36214 commented on
Mar 18, 2025 • 0 new comments -
fix: condition bos_token_id and space as token
#36211 commented on
Mar 19, 2025 • 0 new comments -
(ugly) Use `parallelism=4` for `check_repository_consistency`
#36197 commented on
Mar 20, 2025 • 0 new comments -
Flash Attention v3
#36190 commented on
Mar 21, 2025 • 0 new comments -
fix immediate quantization of the first token in QuantizedCache
#35760 commented on
Mar 20, 2025 • 0 new comments -
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on
Mar 20, 2025 • 0 new comments -
Remove head mask in generative models
#35786 commented on
Mar 19, 2025 • 0 new comments -
[WIP] add deepseek-v3
#35926 commented on
Mar 20, 2025 • 0 new comments -
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on
Mar 21, 2025 • 0 new comments -
Several fixes related to rotary position embeddings
#35901 commented on
Mar 19, 2025 • 0 new comments -
[WP] PagedAttention + Prefix Cache for FlashAttention2
#36737 commented on
Mar 21, 2025 • 0 new comments -
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 commented on
Mar 22, 2025 • 0 new comments -
Fix image processor speedup fixed
#36732 commented on
Mar 17, 2025 • 0 new comments -
Add CSM model
#36719 commented on
Mar 21, 2025 • 0 new comments -
fix whisper re-compile
#36712 commented on
Mar 21, 2025 • 0 new comments -
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on
Mar 17, 2025 • 0 new comments -
prune LM Head for USD
#36695 commented on
Mar 19, 2025 • 0 new comments -
don't pass NoneType for keep_in_fp32_modules
#36675 commented on
Mar 20, 2025 • 0 new comments -
Update quantizer_bnb_4bit.py
#36669 commented on
Mar 20, 2025 • 0 new comments -
[i18n-KO] Translated `keypoint_detection.md` to Korean
#36649 commented on
Mar 17, 2025 • 0 new comments -
Fix device issue in modeling_qwen2
#36647 commented on
Mar 21, 2025 • 0 new comments -
Refine parameter type annotations
#36644 commented on
Mar 20, 2025 • 0 new comments -
[WiP] Add Aimv2 model
#36625 commented on
Mar 22, 2025 • 0 new comments -
[WIP] Add support to load models with transforms
#36621 commented on
Mar 23, 2025 • 0 new comments -
Fixed 30s timestamp resets in Whisper long-form transcription
#36612 commented on
Mar 20, 2025 • 0 new comments -
Allow saving and loading multiple "raw" chat template files
#36588 commented on
Mar 20, 2025 • 0 new comments -
fix for loading gguf quantized model
#36563 commented on
Mar 20, 2025 • 0 new comments -
[Validation] First implementation of `@strict_dataclass` from `huggingface_hub`
#36534 commented on
Mar 17, 2025 • 0 new comments -
Add an event related to forward in the TrainerCallback
#36496 commented on
Mar 18, 2025 • 0 new comments -
Add NVIDIA Cosmos
#36476 commented on
Mar 19, 2025 • 0 new comments -
Customize docstrings fast image processor
#36466 commented on
Mar 17, 2025 • 0 new comments -
Add PlainDETR
#36437 commented on
Mar 18, 2025 • 0 new comments -
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on
Mar 21, 2025 • 0 new comments -
Add EfficientLoFTR model
#36355 commented on
Mar 19, 2025 • 0 new comments -
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default
#36307 commented on
Mar 20, 2025 • 0 new comments -
[`ModernBERT`] Never save 'reference_compile' config; should be set based on end user
#36305 commented on
Mar 20, 2025 • 0 new comments -
enable tp on CPU
#36299 commented on
Mar 19, 2025 • 0 new comments -
Update composition flag usage
#36263 commented on
Mar 19, 2025 • 0 new comments -
Add D-FINE Model into Transformers
#36261 commented on
Mar 20, 2025 • 0 new comments -
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 commented on
Mar 20, 2025 • 0 new comments -
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on
Mar 20, 2025 • 0 new comments -
Incompatibility in flash_attention_2 + Llama + Transformers>=4.43 + Autocast to fp16
#36224 commented on
Mar 20, 2025 • 0 new comments -
IsADirectoryError when training with tqdm enabled for trainer
#34766 commented on
Mar 20, 2025 • 0 new comments -
`AutoModelForCasualLM.from_pretrained()` exits without warning/error
#36245 commented on
Mar 19, 2025 • 0 new comments -
Marian RNN conversion support
#36651 commented on
Mar 19, 2025 • 0 new comments -
Stop output to stdout in streamers.py methods
#36562 commented on
Mar 19, 2025 • 0 new comments -
Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`
#35765 commented on
Mar 19, 2025 • 0 new comments -
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on
Mar 19, 2025 • 0 new comments -
Difficulties with multi-GPU Inferencing
#36634 commented on
Mar 19, 2025 • 0 new comments -
Cryptic error when using AutoTokenizer with SentencePiece tokenizers without sentencepiece installed
#36291 commented on
Mar 19, 2025 • 0 new comments -
Model Card to include key information (e.g. max_sequence_length, etc.)
#36743 commented on
Mar 19, 2025 • 0 new comments -
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#36738 commented on
Mar 19, 2025 • 0 new comments -
Error when tokenizer is set to string: `AttributeError: 'str' object has no attribute 'pad_token_id'`
#36731 commented on
Mar 19, 2025 • 0 new comments -
Custom 4D tensor caused shape mismatch error
#35290 commented on
Mar 18, 2025 • 0 new comments -
Gemma 3 1B - TypeError: 'NoneType' object is not callable
#36745 commented on
Mar 18, 2025 • 0 new comments -
Tensor size mismatch when trying to run RT-DETR on multiple gpus
#33165 commented on
Mar 18, 2025 • 0 new comments -
Issue with Progressive Generation Using inputs_embeds and past_key_values
#35707 commented on
Mar 18, 2025 • 0 new comments -
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 commented on
Mar 18, 2025 • 0 new comments -
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 commented on
Mar 18, 2025 • 0 new comments -
Token healing throws error with "Qwen/Qwen2.5-Coder-7B-Instruct"
#36210 commented on
Mar 18, 2025 • 0 new comments -
[bug] use_gather_object is not respected after the first eval in trainer
#36213 commented on
Mar 18, 2025 • 0 new comments -
Model trained with Flash Attention 2.0 raises "RuntimeError: query and key must have the same dtype" when generating
#30019 commented on
Mar 18, 2025 • 0 new comments -
lm_head parameters missing from named_parameters() in Qwen2.5-VL-3B-Instruct model
#36598 commented on
Mar 17, 2025 • 0 new comments -
SDPA `is_causal=False` has no effect due to `LlamaModel._prepare_4d_causal_attention_mask_with_cache_position`
#36150 commented on
Mar 17, 2025 • 0 new comments -
model.generate function is not compatible with custom position_ids
#36510 commented on
Mar 17, 2025 • 0 new comments -
MultiTask Classification and label_names on Trainer
#33193 commented on
Mar 17, 2025 • 0 new comments -
The parameter 'text' may be None as the comments says, there is a confuse.
#36667 commented on
Mar 17, 2025 • 0 new comments -
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on
Mar 17, 2025 • 0 new comments -
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 commented on
Mar 17, 2025 • 0 new comments -
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on
Mar 16, 2025 • 0 new comments -
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on
Mar 20, 2025 • 0 new comments -
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on
Mar 19, 2025 • 0 new comments -
Add Relation DETR
#34900 commented on
Mar 20, 2025 • 0 new comments -
Bye bye env vars, keep everything as configs
#34886 commented on
Mar 19, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Mar 21, 2025 • 0 new comments -
[`AutoDocstring`] Based on inspect parsing of the signature
#33771 commented on
Mar 17, 2025 • 0 new comments -
Trainer: add predict with generate
#32346 commented on
Mar 21, 2025 • 0 new comments -
Improve support for image generation with Chameleon & Anole
#32013 commented on
Mar 19, 2025 • 0 new comments -
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on
Mar 23, 2025 • 0 new comments -
tensor parallel training bug
#36296 commented on
Mar 23, 2025 • 0 new comments -
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on
Mar 23, 2025 • 0 new comments -
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on
Mar 23, 2025 • 0 new comments -
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on
Mar 23, 2025 • 0 new comments -
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 commented on
Mar 22, 2025 • 0 new comments -
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on
Mar 22, 2025 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Mar 22, 2025 • 0 new comments -
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on
Mar 22, 2025 • 0 new comments -
Torch -> ONNX doesn't work after upgrading transformers to 4.49.0
#36276 commented on
Mar 22, 2025 • 0 new comments -
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on
Mar 22, 2025 • 0 new comments -
past_key_value(s) name inconsistency causing problems
#36290 commented on
Mar 22, 2025 • 0 new comments -
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on
Mar 22, 2025 • 0 new comments -
Add Magma from Microsoft to Transformers
#36629 commented on
Mar 21, 2025 • 0 new comments -
AutoModel from_pretrained does not recursively download relative imports
#36653 commented on
Mar 21, 2025 • 0 new comments -
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on
Mar 21, 2025 • 0 new comments -
Mask2Former _init_weights
#35877 commented on
Mar 21, 2025 • 0 new comments -
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 commented on
Mar 21, 2025 • 0 new comments -
Community contribution: Adding GGUF support for more architectures
#33260 commented on
Mar 21, 2025 • 0 new comments -
cannot import name 'is_timm_config_dict' from 'transformers.utils.generic'
#36068 commented on
Mar 21, 2025 • 0 new comments -
ValueError: Unrecognized image processor in Qwen/Qwen2.5-VL-3B-Instruct.
#36193 commented on
Mar 21, 2025 • 0 new comments -
modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
#36071 commented on
Mar 20, 2025 • 0 new comments