Pulse · huggingface/transformers · GitHub

February 26, 2025 – March 26, 2025

Overview

373 Active pull requests

278 Active issues

Could not load contribution data

Please try again later

5 Releases published by 2 people

v4.49.0-AyaVision Aya Vision (Based on v4.49.0)
published Mar 4, 2025
v4.49.0-Gemma-3 Gemma 3 (Based on v4.49.0)
published Mar 18, 2025
v4.49.0-Mistral-3 Mistral 3 (Based on v4.49.0)
published Mar 18, 2025
v4.50.0 Release v4.50.0
published Mar 21, 2025
v4.50.1 Patch release v4.50.1
published Mar 25, 2025

244 Pull requests merged by 88 people

[docs] Attention mask image
#36970 merged Mar 26, 2025
Remove deprecated training arguments
#36946 merged Mar 26, 2025
fix typos in the code comments and error messages
#36993 merged Mar 26, 2025
Log the correct learning rate
#36973 merged Mar 26, 2025
Fix device_map check for ggml files
#37003 merged Mar 26, 2025
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.
#36975 merged Mar 26, 2025
Allow easy registration of custom attention functions
#36889 merged Mar 26, 2025
Fix get_device_properties
#36997 merged Mar 26, 2025
Fix Optional type annotation
#36841 merged Mar 26, 2025
Install networkx==3.2.1 manually in some CircleCI jobs after #36957
#37000 merged Mar 26, 2025
Use torch.expm1
#36995 merged Mar 26, 2025
byebye CircleCI TF jobs
#36998 merged Mar 26, 2025
Fix tensor dtype mismatch
#36985 merged Mar 26, 2025
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default
#36307 merged Mar 25, 2025
update bot comment again
#36974 merged Mar 25, 2025
Add ruff target-version
#36971 merged Mar 25, 2025
[docs] Fix image link
#36869 merged Mar 25, 2025
Remove extra tensor clone in PyTorch code
#36748 merged Mar 25, 2025
update
#36972 merged Mar 25, 2025
Updated docker files to use uv for installing packages
#36957 merged Mar 25, 2025
typo fixed in README_fr.md
#36951 merged Mar 25, 2025
Change GPUS to GPUs
#36945 merged Mar 25, 2025
Update after #36962
#36965 merged Mar 25, 2025
Update ruff to 0.11.2
#36962 merged Mar 25, 2025
[Utils] torch version checks optionally accept dev versions
#36847 merged Mar 25, 2025
Fix cuda index issue in cache allocator
#36937 merged Mar 25, 2025
Support return_tensors in audio chat templates
#34601 merged Mar 25, 2025
fix typos in the tests directory
#36932 merged Mar 25, 2025
Export for Phi4-mini
#36780 merged Mar 25, 2025
Fixing _pre_quantization_dtype when torch_dtype is None
#36930 merged Mar 25, 2025
Add Phi4 multimodal
#36939 merged Mar 25, 2025
Deprecate #36741 and map Causal to Conditional
#36917 merged Mar 25, 2025
Disallow Offload to disk for gguf files
#36933 merged Mar 24, 2025
Fix processor kwargs qwen2 vl
#36890 merged Mar 24, 2025
Added support for seed in DataCollatorForWholeWordMask
#36903 merged Mar 24, 2025
More precise comment
#36935 merged Mar 24, 2025
Fix pytorch defomr attn path
#36923 merged Mar 24, 2025
[2/N] Use pyupgrade --py39-plus to improve code
#36857 merged Mar 24, 2025
Update trainer_pt_utils.py docstrings for consistency
#36912 merged Mar 24, 2025
Fix typos
#36910 merged Mar 24, 2025
Use another repo. for Mistral3 processor testing
#36925 merged Mar 24, 2025
Fix Compressed tensors to_dict_diff
#36922 merged Mar 24, 2025
[chameleon] fix num image token check
#36918 merged Mar 24, 2025
tests: fix asyncio.wait() usage for python>=3.11
#36898 merged Mar 24, 2025
[Fix] Add original_max_position_embeddings to YARN rope_scaling optional keys
#36877 merged Mar 24, 2025
Fix torch version guard at import
#36907 merged Mar 24, 2025
fix Gemma3 Config
#36893 merged Mar 24, 2025
Update installation.md
#36826 merged Mar 21, 2025
[docs] Model docs
#36469 merged Mar 21, 2025
Fix Pan and Scan on batched images Gemma3
#36864 merged Mar 21, 2025
Simplify keep_in_fp32_modules logic
#36722 merged Mar 21, 2025
fix: loss computation after embeddings resize - mllama
#36840 merged Mar 21, 2025
Fix: dtype cannot be str
#36262 merged Mar 21, 2025
Minor Gemma 3 fixes
#36884 merged Mar 21, 2025
Use deformable_detr kernel from the Hub
#36853 merged Mar 21, 2025
Gemma 3 tests expect greedy decoding
#36882 merged Mar 21, 2025
🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing
#36859 merged Mar 21, 2025
[generate] model defaults being inherited only happens for newer models
#36881 merged Mar 21, 2025
Revert "Update deprecated Jax calls (#35919)"
#36880 merged Mar 21, 2025
Make ViTPooler configurable
#36517 merged Mar 21, 2025
chore: fix typos in the tests directory
#36813 merged Mar 21, 2025
Remove call to .item in get_batch_samples
#36861 merged Mar 21, 2025
FIX FSDP plugin update for QLoRA
#36720 merged Mar 21, 2025
[CI] doc builder without custom image
#36862 merged Mar 21, 2025
Mllama: raise better error
#35934 merged Mar 21, 2025
Refactor Aya Vision with modular
#36688 merged Mar 20, 2025
Add support for seed in DataCollatorForLanguageModeling
#36497 merged Mar 20, 2025
[CI] fix update metadata job
#36850 merged Mar 20, 2025
Gemma3: fix test
#36820 merged Mar 20, 2025
[torchao] revert to get_apply_tensor_subclass
#36849 merged Mar 20, 2025
Add model visual debugger
#36798 merged Mar 20, 2025
Add Prompt Depth Anything Model
#35401 merged Mar 20, 2025
Refactor Attention implementation for ViT-based models
#36545 merged Mar 20, 2025
DeepSpeed tensor parallel+ZeRO
#36825 merged Mar 20, 2025
Support loading Quark quantized models in Transformers
#36372 merged Mar 20, 2025
Use pyupgrade --py39-plus to improve code
#36843 merged Mar 20, 2025
Fix hqq skipped modules and dynamic quant
#36821 merged Mar 20, 2025
Fix ONNX export for sequence classification head
#36332 merged Mar 20, 2025
Shieldgemma2
#36678 merged Mar 20, 2025
Fix: remove the redundant snippet of _whole_word_mask
#36759 merged Mar 20, 2025
Gemma 3: Adding explicit GenerationConfig and refactoring conversion …
#36833 merged Mar 20, 2025
Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh"
#36768 merged Mar 20, 2025
Update min safetensors bis
#36823 merged Mar 20, 2025
[generate] clarify docstrings: when to inherit GenerationMixin
#36605 merged Mar 20, 2025
[modular] Sort modular skips
#36304 merged Mar 20, 2025
Pass state dict
#35234 merged Mar 20, 2025
[qwen2 audio] remove redundant code and update docs
#36282 merged Mar 20, 2025
Update deprecated Jax calls
#35919 merged Mar 20, 2025
Fix fp16 ONNX export for RT-DETR and RT-DETRv2
#36460 merged Mar 20, 2025
Pass num_items_in_batch directly to loss computation
#36753 merged Mar 20, 2025
Saving Trainer.collator.tokenizer in when Trainer.processing_class is None
#36552 merged Mar 20, 2025
fix tiktoken convert to pass AddedToken to Tokenizer
#36566 merged Mar 20, 2025
[ForCausalLMLoss] allow users to pass shifted labels
#36607 merged Mar 20, 2025
Disable inductor config setter by default
#36608 merged Mar 20, 2025
Fix swanlab global step
#36728 merged Mar 20, 2025
rewrite main method in Qwen2, making it more clear
#36772 merged Mar 20, 2025
Move the warning to the documentation for DataCollatorWithFlattening
#36707 merged Mar 20, 2025
Remove our AdamW implementation
#36177 merged Mar 19, 2025
Update configuration_qwen2.py
#36735 merged Mar 19, 2025
quick fix fast_image_processor register error
#36716 merged Mar 19, 2025
Add Space to Bitsandbytes doc
#36834 merged Mar 19, 2025
Support tracable dynamicKVcache
#36311 merged Mar 19, 2025
One more fix for reviewer assignment
#36829 merged Mar 19, 2025
[gemma 3] multimodal checkpoints + AutoModelForCausalLM
#36741 merged Mar 19, 2025
enable OffloadedCache on XPU from PyTorch 2.7
#36654 merged Mar 19, 2025
Add option for ao base configs
#36526 merged Mar 19, 2025
Add attention visualization tool
#36630 merged Mar 19, 2025
[Generation] remove leftover code from end-to-end compilation
#36685 merged Mar 19, 2025
Fix Device map for bitsandbytes tests
#36800 merged Mar 19, 2025
Remove dist": "loadfile" for pytest for CircleCI jobs
#36811 merged Mar 19, 2025
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
#36572 merged Mar 19, 2025
Expectations test utils
#36569 merged Mar 18, 2025
[generate] ✨ vectorized beam search ✨
#35802 merged Mar 18, 2025
Support custom dosctrings in modular
#36726 merged Mar 18, 2025
Fix chameleon's TypeError because inputs_embeds may None
#36673 merged Mar 18, 2025
Fix casting dtype for qunatization
#36799 merged Mar 18, 2025
Fix Mistral3 tests
#36797 merged Mar 18, 2025
Loading optimizations
#36742 merged Mar 18, 2025
Update SHA for tj-actions/changed-files
#36795 merged Mar 18, 2025
fix hqq due to recent modeling changes
#36771 merged Mar 18, 2025
Add Mistral3
#36790 merged Mar 18, 2025
Fix gemma3_text tokenizer in mapping
#36793 merged Mar 18, 2025
Fixing typo in gemma3 image_processor_fast and adding a small test
#36776 merged Mar 18, 2025
chore: fix typos in tests directory
#36785 merged Mar 18, 2025
fix typos in the tests directory
#36717 merged Mar 17, 2025
doc: Clarify is_decoder usage in PretrainedConfig documentation
#36724 merged Mar 17, 2025
[docs] Update README
#36265 merged Mar 17, 2025
[CI] remove redundant checks in test_eager_matches_sdpa_inference
#36740 merged Mar 17, 2025
[MINOR:TYPO] Update hubert.md
#36733 merged Mar 17, 2025
Fix TrainingArguments.torch_empty_cache_steps post_init check
#36734 merged Mar 17, 2025
Fix test isolation for clear_import_cache utility
#36345 merged Mar 17, 2025
fix xpu tests
#36656 merged Mar 17, 2025
Allow ray datasets to be used with trainer
#36699 merged Mar 17, 2025
fix can_generate
#36570 merged Mar 17, 2025
enable/disable compile for quants methods
#36519 merged Mar 17, 2025
🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings
#36422 merged Mar 17, 2025
[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config
#36684 merged Mar 15, 2025
Fix grad accum arbitrary value
#36691 merged Mar 14, 2025
Fix post_init() code duplication
#36727 merged Mar 14, 2025
🌐 [i18n-KO] Translated codegen.md to Korean
#36698 merged Mar 14, 2025
[tests] Parameterized test_eager_matches_sdpa_inference
#36650 merged Mar 14, 2025
Try working around the processor registration bugs
#36184 merged Mar 14, 2025
Fix/best model checkpoint fix
#35885 merged Mar 14, 2025
[model loading] don't gc.collect() if only 1 shard is used
#36721 merged Mar 14, 2025
Cleanup the regex used for doc preprocessing
#36648 merged Mar 14, 2025
Make the flaky list a little more general
#36704 merged Mar 14, 2025
Gemma3 processor typo
#36710 merged Mar 14, 2025
Add support for fast image processors in add-new-model-like CLI
#36313 merged Mar 13, 2025
Final CI cleanup
#36703 merged Mar 13, 2025
Add GGUF support to T5-Encoder
#36700 merged Mar 13, 2025
Handling an exception related to HQQ quantization in modeling
#36702 merged Mar 13, 2025
fix: fsdp sharded state dict wont work for save_only_model knob
#36627 merged Mar 13, 2025
Add loading speed test
#36671 merged Mar 13, 2025
[CI] Automatic rerun of certain test failures
#36694 merged Mar 13, 2025
chore: fix typos in utils module
#36668 merged Mar 13, 2025
Fix dtype for params without tp_plan
#36681 merged Mar 13, 2025
fix type annotation for ALL_ATTENTION_FUNCTIONS
#36690 merged Mar 13, 2025
Change Qwen2_VL image processors to have init and call accept the same kwargs
#36207 merged Mar 13, 2025
Upgrading torch version and cuda version in quantization docker
#36264 merged Mar 13, 2025
fix wandb hp search unable to resume from sweep_id
#35883 merged Mar 13, 2025
Changing the test model in Quanto kv cache
#36670 merged Mar 13, 2025
Fix slicing for 0-dim param
#36580 merged Mar 13, 2025
Update config.torch_dtype correctly
#36679 merged Mar 13, 2025
[Cache] Don't initialize the cache on meta device
#36543 merged Mar 13, 2025
Fix rescale normalize inconsistencies in fast image processors
#36388 merged Mar 13, 2025
Refactor siglip2 fast image processor
#36406 merged Mar 13, 2025
Remove differences between init and preprocess kwargs for fast image processors
#36186 merged Mar 12, 2025
[quants] refactor logic for modules_to_not_convert
#36672 merged Mar 12, 2025
Remove hardcoded slow image processor class in processors supporting fast ones
#36266 merged Mar 12, 2025
Fix Failing GPTQ tests
#36666 merged Mar 12, 2025
Don't accidentally mutate the base_model_tp_plan
#36677 merged Mar 12, 2025
[core] Large/full refactor of from_pretrained
#36033 merged Mar 12, 2025
Fix bnb regression due to empty state dict
#36663 merged Mar 12, 2025
[CI] gemma 3 make fix-copies
#36664 merged Mar 12, 2025
fix block mask typing
#36661 merged Mar 12, 2025
HPU support
#36424 merged Mar 12, 2025
Gemma3
#36658 merged Mar 12, 2025
fix typos in the docs directory
#36639 merged Mar 11, 2025
Fix gguf docs
#36601 merged Mar 11, 2025
Remove research projects
#36645 merged Mar 11, 2025
[docs] Update docs dependency
#36635 merged Mar 11, 2025
Stop warnings from unnecessary torch.tensor() overuse
#36538 merged Mar 11, 2025
Remove remote code warning
#36285 merged Mar 11, 2025
Fix AriaForConditionalGeneration flex attn test
#36604 merged Mar 11, 2025
Proper_flex
#36643 merged Mar 11, 2025
Fix bugs in mllama image processing
#36156 merged Mar 11, 2025
Refactor some core stuff
#36539 merged Mar 11, 2025
[docs] Serving LLMs
#36522 merged Mar 10, 2025
chore: fix typos in language models
#36586 merged Mar 10, 2025
Fix auto-assign reviewers
#36631 merged Mar 10, 2025
[HybridCache] disable automatic compilation
#36620 merged Mar 10, 2025
Fix check for XPU. PyTorch >= 2.6 no longer needs ipex.
#36593 merged Mar 7, 2025
Fixed datatype related issues in DataCollatorForLanguageModeling
#36457 merged Mar 7, 2025
Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/decision_transformer
#36582 merged Mar 7, 2025
Update "who to tag" / "who can review"
#36394 merged Mar 7, 2025
Update chat_extras.md with content correction
#36599 merged Mar 7, 2025
Github action for auto-assigning reviewers
#35846 merged Mar 7, 2025
Export base streamer.
#36500 merged Mar 7, 2025
avoid errors when the size of input_ids passed to PrefixConstrainedLogitsProcessor is zero
#36489 merged Mar 7, 2025
Mention UltraScale Playbook 🌌 in docs
#36589 merged Mar 6, 2025
fix: argument
#36558 merged Mar 6, 2025
[XGLM] tag tests as slow
#36592 merged Mar 6, 2025
[bark] fix loading of generation config
#36587 merged Mar 6, 2025
Integrate SwanLab for offline/online experiment tracking and local visualization
#36433 merged Mar 6, 2025
Modular Conversion --fix_and_overwrite on Windows
#36583 merged Mar 6, 2025
Delete redundancy if case in model_utils
#36559 merged Mar 6, 2025
Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/pplm
#36540 merged Mar 6, 2025
chore: enhance message descriptions in parameters,comments,logs and docstrings
#36554 merged Mar 6, 2025
Fix typos .
#36551 merged Mar 6, 2025
Fix typos in tests
#36547 merged Mar 5, 2025
guard torch version for uint16
#36520 merged Mar 5, 2025
chore: enhance messages in docstrings
#36525 merged Mar 4, 2025
Fix links in quantization doc
#36528 merged Mar 4, 2025
Fix bamba tests amd
#36535 merged Mar 4, 2025
chore: Fix typos in docs and examples
#36524 merged Mar 4, 2025
Add aya
#36521 merged Mar 4, 2025
[docs] Redesign
#31757 merged Mar 3, 2025
Remove unused code
#36459 merged Mar 3, 2025
[Style] fix E721 warnings
#36474 merged Mar 3, 2025
Fix edge case for continue_final_message
#36404 merged Mar 3, 2025
Fix pipeline+peft interaction
#36480 merged Mar 3, 2025
chore: fix message descriptions in arguments and comments
#36504 merged Mar 3, 2025
Fix some typos in docs
#36502 merged Mar 3, 2025
fix torch_dtype, contiguous, and load_state_dict regression
#36512 merged Mar 3, 2025
Fix kwargs UserWarning in SamImageProcessor
#36479 merged Mar 3, 2025
Check TRUST_REMOTE_CODE for RealmRetriever for security
#36511 merged Mar 3, 2025
Fix loading zero3 weights
#36455 merged Mar 3, 2025
Fix _load_state_dict_into_meta_model with device_map=None
#36488 merged Mar 2, 2025
Fix couples of issues from #36335
#36453 merged Mar 1, 2025
Add Got-OCR 2 Fast image processor and refactor slow one
#36185 merged Mar 1, 2025
[docs] fix bug in deepspeed config
#36081 merged Feb 28, 2025
Fix loading models with mismatched sizes
#36463 merged Feb 28, 2025
[GroundingDino] Fix grounding dino loss 🚨
#31828 merged Feb 27, 2025
Fix hub_retry
#36449 merged Feb 27, 2025
Lazy import libraries in src/transformers/image_utils.py
#36435 merged Feb 27, 2025
[generate] torch.distributed-compatible DynamicCache
#36373 merged Feb 27, 2025
[save_pretrained ] Skip collecting duplicated weight
#36409 merged Feb 27, 2025
Add contents: write
#36445 merged Feb 27, 2025
Fix another permission
#36444 merged Feb 27, 2025
Fix permission
#36443 merged Feb 27, 2025
Change PR to draft when it is (re)opened
#36417 merged Feb 27, 2025
restrict cache allocator to non quantized model
#36428 merged Feb 26, 2025
Fix Expected output for compressed-tensors tests
#36425 merged Feb 26, 2025
Update form pretrained to make TP a first class citizen
#36335 merged Feb 26, 2025

129 Pull requests opened by 93 people

Add fetch_paginated_github_data to deduplicate GitHub API pagination …
#36432 opened Feb 26, 2025
Fix model saving bug post training with tensor parallel in Accelerate
#36434 opened Feb 26, 2025
Add PlainDETR
#36437 opened Feb 26, 2025
add FlashAttentionKwargs and seq_idx to flat collator
#36456 opened Feb 27, 2025
Customize docstrings fast image processor
#36466 opened Feb 27, 2025
Add NVIDIA Cosmos
#36476 opened Feb 28, 2025
Fix incorrect attention mask truncate in WhisperFlashAttention2
#36477 opened Feb 28, 2025
Sanitize Model Module Names to Follow Python Conventions
#36478 opened Feb 28, 2025
Export T5 (encoder-decoder) to ExecuTorch
#36486 opened Mar 1, 2025
Allow OOV Image Token for LLaVa Next Variants
#36491 opened Mar 2, 2025
Create and Expose SamVisionModel as public for better accessibility
#36493 opened Mar 2, 2025
Add an event related to forward in the TrainerCallback
#36496 opened Mar 2, 2025
Refactor object-detection models
#36514 opened Mar 3, 2025
[Validation] First implementation of `@strict_dataclass` from `huggingface_hub`
#36534 opened Mar 4, 2025
add-long-vita
#36553 opened Mar 5, 2025
Fix edge case for tokenize (#36277)
#36555 opened Mar 5, 2025
add-LongVITAModel
#36556 opened Mar 5, 2025
fix for loading gguf quantized model
#36563 opened Mar 5, 2025
Allow saving and loading multiple "raw" chat template files
#36588 opened Mar 6, 2025
Attention mechanisms elaboration
#36597 opened Mar 6, 2025
[audio utils] fix fft_bin_width computation
#36603 opened Mar 7, 2025
Add StableAdamW Optimizer
#36606 opened Mar 7, 2025
Fixed 30s timestamp resets in Whisper long-form transcription
#36612 opened Mar 7, 2025
Add Distill Any Depth
#36614 opened Mar 8, 2025
[WIP] Add support to load models with transforms
#36621 opened Mar 9, 2025
[WiP] Add Aimv2 model
#36625 opened Mar 10, 2025
[Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!!
#36632 opened Mar 10, 2025
Refine parameter type annotations
#36644 opened Mar 11, 2025
Fix device issue in modeling_qwen2
#36647 opened Mar 11, 2025
[i18n-KO] Translated `keypoint_detection.md` to Korean
#36649 opened Mar 11, 2025
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 opened Mar 11, 2025
Update quantizer_bnb_4bit.py
#36669 opened Mar 12, 2025
don't pass NoneType for keep_in_fp32_modules
#36675 opened Mar 12, 2025
Support batch size > 1 image-text inference
#36682 opened Mar 12, 2025
prune LM Head for USD
#36695 opened Mar 13, 2025
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 opened Mar 13, 2025
Limit numpy version to <2.0.0
#36706 opened Mar 13, 2025
Fix long lagging when streaming text without spaces and NJK chars
#36708 opened Mar 13, 2025
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 opened Mar 14, 2025
fix whisper re-compile
#36712 opened Mar 14, 2025
Add CSM model
#36719 opened Mar 14, 2025
Fix generation using flash-attention and static cache
#36729 opened Mar 14, 2025
Fix image processor speedup fixed
#36732 opened Mar 14, 2025
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 opened Mar 15, 2025
[WP] PagedAttention + Prefix Cache for FlashAttention2
#36737 opened Mar 15, 2025
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 opened Mar 16, 2025
Add Qwen2.5-Omni
#36752 opened Mar 16, 2025
🌐 [i18n-KO] Translated 'serving.md' to Korean
#36756 opened Mar 17, 2025
🌐 [i18n-KO] Translated `gpu_selection.md` to Korean
#36757 opened Mar 17, 2025
feat: expose the strict flag to allow catching missing model layers while loading a checkpoint
#36760 opened Mar 17, 2025
🌐 [i18n-KO] Translated `electra.md` to Korean
#36763 opened Mar 17, 2025
Add support for audios in apply_chat_template
#36770 opened Mar 17, 2025
Use public export API on torch 2.5 and future
#36781 opened Mar 18, 2025
Fix attention_mask dimension issue in GPT2Model
#36782 opened Mar 18, 2025
Create modeling_ngen3.py for NGen3
#36787 opened Mar 18, 2025
Update configuration_auto.py for NGen3
#36791 opened Mar 18, 2025
Refactor `return_dict` logic to remove complicated if/else paths
#36794 opened Mar 18, 2025
[don't merge] check tokenizer ci job
#36796 opened Mar 18, 2025
Add Granite Speech Support
#36801 opened Mar 18, 2025
Add long vita
#36807 opened Mar 19, 2025
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 opened Mar 19, 2025
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 opened Mar 19, 2025
Use `lru_cache` for tokenization tests
#36818 opened Mar 19, 2025
Dummies
#36827 opened Mar 19, 2025
[Modeling] Load FP8 safetensors such as DeepSeek
#36828 opened Mar 19, 2025
gemma3 fp16 fix
#36832 opened Mar 19, 2025
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 opened Mar 19, 2025
Remove unnecessary attr assignment
#36837 opened Mar 19, 2025
Move `return_dict` logic into `can_return_tuple` decorator
#36838 opened Mar 19, 2025
Haocheng lu
#36839 opened Mar 19, 2025
fix pegasus init weights and other copied models
#36844 opened Mar 20, 2025
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 opened Mar 20, 2025
fix: prevent input side-effects in processor text args
#36866 opened Mar 20, 2025
Only count num items in batch when needed
#36867 opened Mar 20, 2025
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 opened Mar 20, 2025
Improve Model Download Speeds By ~3x For Large Models
#36870 opened Mar 21, 2025
Adding Qwen3 and Qwen3MoE
#36878 opened Mar 21, 2025
Fix `resume_from_checkpoint` not recognising `"last-checkpoint"`
#36883 opened Mar 21, 2025
Optimize `to_py_obj` for python-native numeric lists and scalars
#36885 opened Mar 21, 2025
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 opened Mar 21, 2025
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0)
#36891 opened Mar 21, 2025
[WIP] Computer vision util: vision visualizer
#36892 opened Mar 21, 2025
Enable tracing for Moshi
#36894 opened Mar 21, 2025
Add RF-DETR
#36895 opened Mar 21, 2025
Adding ArlowGPT
#36899 opened Mar 22, 2025
Add NGen3
#36901 opened Mar 22, 2025
LogfireCallback: Integrating Logfire with Hugging Face’s Trainer
#36905 opened Mar 22, 2025
fix cached file error when repo type is dataset
#36909 opened Mar 23, 2025
Limit number of evaluation samples processed during training
#36916 opened Mar 24, 2025
[qwen2-audio] remove default template
#36919 opened Mar 24, 2025
Allow disabling `deformable_detr` kernels
#36927 opened Mar 24, 2025
Remove the redundant shift during the loss computation in the Moshi m…
#36928 opened Mar 24, 2025
Aligning modling code for GPT2 to work with vLLM (fallback)
#36934 opened Mar 24, 2025
[3/N] Use pyupgrade --py39-plus to improve code
#36936 opened Mar 24, 2025
Static cache should support indexing
#36943 opened Mar 24, 2025
Improve typing in TrainingArgument
#36944 opened Mar 25, 2025
fix(qwen): fix shape error when using tp
#36947 opened Mar 25, 2025
Update image_processing_qwen2_vl.py。fix bug.
#36948 opened Mar 25, 2025
Avoid unnecessary tensor copy in loss computing
#36950 opened Mar 25, 2025
Added Sapnous Architecture
#36952 opened Mar 25, 2025
Skip code `307` in `RequestCounter`
#36953 opened Mar 25, 2025
[chat templates} support loading audio from video
#36955 opened Mar 25, 2025
fix: Fully remove legacy cache from Llama
#36958 opened Mar 25, 2025
Remove low_cpu_mem_usage and _fast_init
#36963 opened Mar 25, 2025
More ReDOS fixes!
#36964 opened Mar 25, 2025
[phi-4] use mel filters from audio utils
#36966 opened Mar 25, 2025
Add new dim to `num_items_in_batch` if necessary
#36967 opened Mar 25, 2025
Make executorch integration more seamless by analyzing model signature
#36969 opened Mar 25, 2025
Refactor image processor phi4
#36976 opened Mar 25, 2025
Add device workaround for int4 weight only quantization after API update
#36980 opened Mar 25, 2025
Refactor attention for SigLIP based models
#36981 opened Mar 25, 2025
clean pipeline question_answering.
#36986 opened Mar 26, 2025
fix comment misdirection during scaling loss
#36987 opened Mar 26, 2025
fix transformers_cli import relative path issue
#36989 opened Mar 26, 2025
Gaudi: Fix the pipeline failed issue with hpu device
#36990 opened Mar 26, 2025
Set weights_only in torch.load
#36991 opened Mar 26, 2025
fix and enhance pipeline_webserver.md
#36992 opened Mar 26, 2025
remove redundant code in trainer
#36994 opened Mar 26, 2025
[Phi4] add multimodal chat template
#36996 opened Mar 26, 2025
Add Fast SamImageProcessor
#36999 opened Mar 26, 2025
Replace default split function with jnp.split() in flax models
#37001 opened Mar 26, 2025
Fix typing for None valued variables
#37004 opened Mar 26, 2025
[Fast Processor] BEiT
#37005 opened Mar 26, 2025
Remove deprecated batch_size argument
#37007 opened Mar 26, 2025
Skip FP8 linear tests
#37008 opened Mar 26, 2025
Export Whisper to ExecuTorch
#37009 opened Mar 26, 2025
Fix AttentionInterface following feedback
#37010 opened Mar 26, 2025
Add Fast Chinese-CLIP Processor
#37012 opened Mar 26, 2025
[generate, cache] handle more complex device maps
#37014 opened Mar 26, 2025

162 Issues closed by 47 people

Learning rate logging off by one training step
#35942 closed Mar 26, 2025
bitsandbytes integration bug due to trying to alter frozenset in `_validate_bnb_multi_backend_availability()`
#36949 closed Mar 26, 2025
ValueError: `run_compressed` is only supported for quantized_compressed models
#36915 closed Mar 26, 2025
Recent update: configuration_eurobert.py not found -
#36983 closed Mar 26, 2025
Issue with Progressive Generation Using inputs_embeds and past_key_values
#35707 closed Mar 26, 2025
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 closed Mar 26, 2025
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed Mar 26, 2025
Token healing throws error with "Qwen/Qwen2.5-Coder-7B-Instruct"
#36210 closed Mar 26, 2025
[bug] use_gather_object is not respected after the first eval in trainer
#36213 closed Mar 26, 2025
Error: TypeError: argument 'ids': 'float' object cannot be interpreted as an integer
#36984 closed Mar 26, 2025
Clarification on Commercial License Impact of LayoutLMv3ImageProcessor within UdopProcessor
#36931 closed Mar 25, 2025
ImportError: cannot import name 'AdamW' from 'transformers'
#36954 closed Mar 25, 2025
AutoTokenizer/Processor does not work with Mistral3 models
#36968 closed Mar 25, 2025
Ruff update
#36705 closed Mar 25, 2025
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 closed Mar 25, 2025
`Mllama` not supported by `AutoModelForCausalLM` after updating `transformers` to `4.50.0`
#36926 closed Mar 25, 2025
Florence2 stopped working after upgrade to 4.50.0 ("Unrecognized configuration class")
#36886 closed Mar 25, 2025
Design question for integrating new model to Transformers?
#36784 closed Mar 25, 2025
Add seed to data collator classes
#36655 closed Mar 24, 2025
Torch -> ONNX doesn't work after upgrading transformers to 4.49.0
#36276 closed Mar 24, 2025
<spam>
#36924 closed Mar 24, 2025
llama tokenizer encode -> decode is not same
#36325 closed Mar 24, 2025
tj-actions/changed-files action compromised
#36761 closed Mar 24, 2025
Some of test/utils tests fail being invalidated by tests/utils/test_import_utils.py::test_clear_import_cache
#36334 closed Mar 24, 2025
MacOs: register_pytree_node got an unexpected keyword argument 'flatten_with_keys_fn'
#36906 closed Mar 24, 2025
Issue with update
#36888 closed Mar 24, 2025
Gemma2: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
#34706 closed Mar 24, 2025
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
#35612 closed Mar 24, 2025
Pipeline cannot guess which processor to use with Gemma 3
#36911 closed Mar 23, 2025
Unable to export GLM models to ONNX
#35021 closed Mar 23, 2025
ModernBERT inference fails on CPU: ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
#35388 closed Mar 23, 2025
`modular_model_converter` can not handle objects import via try - except
#35414 closed Mar 23, 2025
`TFViTModel` and `interpolate_pos_encoding=True`
#36155 closed Mar 23, 2025
[BART] Cannot copy out of meta tensor; no data!
#36247 closed Mar 21, 2025
Bug introduced in `from_pretrained` `v4.48.3`..`v4.49.0`
#36258 closed Mar 21, 2025
<spam>
#36876 closed Mar 21, 2025
torch._subclasses.fake_tensor.DataDependentOutputException: aten._local_scalar_dense.default with `_prepare_4d_attention_mask_for_sdpa(
#36123 closed Mar 21, 2025
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed Mar 21, 2025
Allow setting a seed for DataCollatorForLanguageModeling
#36357 closed Mar 20, 2025
LlamaAttention has no attribute `rotary_emb` (4.50.0.dev0)
#36758 closed Mar 20, 2025
GPT2 repetition of words in output
#36848 closed Mar 20, 2025
num_items_in_batch unexpected in vision encoder decoder
#36744 closed Mar 20, 2025
Convert RT-DETR model to coreml
#35905 closed Mar 20, 2025
[bug] fast_image_processor register error
#36715 closed Mar 19, 2025
When what needs to be loaded is in the cache directory, there is no need to make a request to the remote
#36762 closed Mar 19, 2025
In the _speculative_sampling function, it seems that the "squeeze" method is being used incorrectly.
#36810 closed Mar 19, 2025
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
#36683 closed Mar 19, 2025
text-to-video_app
#36747 closed Mar 19, 2025
model from_pretrained bug in 4.50.dev0 in these days
#36506 closed Mar 19, 2025
Subtle difference with Pytorch AdamW?
#35504 closed Mar 19, 2025
Qwen2VL exhibits significant performance differences under different attention implementations.
#35749 closed Mar 19, 2025
[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer
#35973 closed Mar 19, 2025
Traning loss not showing with trainer
#36102 closed Mar 19, 2025
when model.generate with num_beams=2 and num_return_sequences=2,the output seqs are different from input_ids of stopping_criteria
#34574 closed Mar 18, 2025
`UnboundLocalError: cannot access local variable 'images_list'` when using Gemma 3 AutoProcessor with use_fast=True
#36739 closed Mar 18, 2025
Gemma3 minimal fine tuning example?
#36714 closed Mar 18, 2025
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35233 closed Mar 18, 2025
incorrect special_tokens_mask
#35897 closed Mar 18, 2025
Llama tokenizer newline character inconsistency
#35923 closed Mar 18, 2025
flex_attention does not output the full attention_weights with output_attention option
#36096 closed Mar 18, 2025
bug in save checkpoint
#36099 closed Mar 18, 2025
qwen2_5_vl processor padding side is wrong.
#36100 closed Mar 18, 2025
ValueError: weight is on the meta device, we need a `value` to put in on 0. `Gemma3`
#36766 closed Mar 17, 2025
Misleading documentation for `is_decoder` configuration parameter
#36482 closed Mar 17, 2025
Gemma2 (quantized) inference is broken - torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment.
#36485 closed Mar 17, 2025
On MoE implementation in HuggingFace
#36730 closed Mar 17, 2025
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed Mar 17, 2025
Cannot load siglip2 processor
#36665 closed Mar 16, 2025
SFTConfig.__init__() got an unexpected keyword argument 'optimizers'
#36749 closed Mar 16, 2025
Model.generate use_cache=True generates different results than use_cache=False
#36536 closed Mar 16, 2025
past_key_values type support bug
#36057 closed Mar 16, 2025
TypeError: empty() missing 1 required positional arguments: "size"
#36061 closed Mar 16, 2025
Transformers can create unconventional python module names when loading certain repositories
#35570 closed Mar 15, 2025
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36010 closed Mar 15, 2025
[feature request] Callback handler event after forward pass in Trainer
#36012 closed Mar 15, 2025
AMD CI tracking issue
#36019 closed Mar 15, 2025
T5 Tokenzier not load with `AttributeError: add_special_tokens conflicts with the method add_special_tokens in T5Tokenizer`
#36032 closed Mar 15, 2025
Issue in resuming finetuning Llama 3.1 Instruct Model
#36035 closed Mar 15, 2025
Initializing via AutoImageProcessor before AutoProcessor is imported causes `AttributeError`
#34307 closed Mar 14, 2025
Trainer sets `state.best_model_checkpoint` even when it doesn't save there; leads to training crash
#35609 closed Mar 14, 2025
ValueError: The checkpoint you are trying to load has model type `gemma3` but Transformers does not recognize this architecture.
#36709 closed Mar 14, 2025
'MERTConfig' object has no attribute 'conv_pos_batch_norm'
#36134 closed Mar 14, 2025
Some questions of `Gemma3` processor
#36701 closed Mar 14, 2025
NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend
#36674 closed Mar 14, 2025
Component loading incorrect dtype
#36686 closed Mar 13, 2025
`disable_compile` not honored as a kwarg in generate
#36544 closed Mar 13, 2025
AutoModel failed with empty tensor error
#36579 closed Mar 13, 2025
Some methods in TrainerControl seem not to be utilized.
#36576 closed Mar 13, 2025
save_only_model with FSDP throws FileNotFoundError error
#36626 closed Mar 13, 2025
Cannot import 'GenerationOutput' in 4.48.1
#35957 closed Mar 13, 2025
GPTQ quantization on Jetson Orin Nano
#36139 closed Mar 12, 2025
past_key_values not being set in model_inputs keys
#36001 closed Mar 12, 2025
The number of safetensors files is different when using CPU and CUDA.
#36595 closed Mar 11, 2025
Downloading models in distributed training
#36414 closed Mar 11, 2025
Warning related to torch.tensor() usage in transformers.models.encodec.modeling_encodec.py (Version 4.47.0)
#36533 closed Mar 11, 2025
Loading a pipeline with `trust_remote_code=True` raises warning
#36273 closed Mar 11, 2025
Extract embeddings of many seqs using ESM2
#36641 closed Mar 11, 2025
Error faced during Finetuning Deepseek-vl2
#36633 closed Mar 11, 2025
paligemma2-3B-mix in version4.49.0 not use GPU and 4.50.0.dev broken
#36575 closed Mar 11, 2025
model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3
#35994 closed Mar 11, 2025
Inconsistent Outputs When Using Flash Attention 2 and SDPA Attention with Attention Mask
#36585 closed Mar 11, 2025
AttributeError: 'MERTConfig' object has no attribute 'conv_pos_batch_norm'
#35656 closed Mar 10, 2025
Why are there so many variables named layrnorm in the codebase?
#36623 closed Mar 10, 2025
Memory Access out of bounds in mra/cuda_kernel.cu::index_max_cuda_kernel()
#35507 closed Mar 10, 2025
Very slow to load deep seekv3 int4 model and device_map="auto" "sequential" bug
#35522 closed Mar 10, 2025
adalomo and deepspeed zero3 offload error
#35977 closed Mar 10, 2025
size mismatch for lm_head when fintune QWEN2.5
#36550 closed Mar 10, 2025
Llama3 tokenizer decode is incorrect for ' ...' with leading space
#36622 closed Mar 9, 2025
Tokenizer does not split text according to newly added input tokens
#35447 closed Mar 9, 2025
Can't use Trainer on mps device
#35954 closed Mar 9, 2025
Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM
#35688 closed Mar 8, 2025
Significant Increase in Computation Time When Using Attention Mask in SDPA Attention
#36584 closed Mar 8, 2025
Accidentally allocating 2x memory in new caching_allocator_warmup
#36483 closed Mar 7, 2025
Open Object Detection Leaderboard: Model Requests not working
#36034 closed Mar 7, 2025
TypeError: LlavaProcessor: got multiple values for keyword argument 'images'
#36578 closed Mar 7, 2025
Attention can be None in ModernBertForSequenceClassification
#35917 closed Mar 7, 2025
meta-llama/Llama-3.2-11B-Vision-Instruct, device_map = 'auto', padding ruins _prepare_4d_causal_attention_mask_with_cache_position
#35918 closed Mar 7, 2025
Lora_B weight becomes 0 when using AuotModel
#36594 closed Mar 6, 2025
Do trailing padding tokens get a forward pass?
#36565 closed Mar 6, 2025
Init on meta device and then materialize on gpu leads to very large errors
#36577 closed Mar 6, 2025
how to use transformers with musicgen with float16
#36546 closed Mar 6, 2025
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 5, kind: Uncategorized, message: "Input/output error" })
#35895 closed Mar 6, 2025
AttributeError: 'dict' object has no attribute '_attn_implementation_internal'
#35900 closed Mar 6, 2025
Dtensor support requires torch>=2.5.1
#36472 closed Mar 5, 2025
Groq inference provider
#36353 closed Mar 4, 2025
After tokenizers upgrade, the length of the token does not correspond to the length of the model
#36532 closed Mar 4, 2025
Incorrect Whisper long-form decoding timestamps
#31942 closed Mar 4, 2025
Help Understanding Beam Search Scores in Hugging Face (LLaMA + LoRA)
#35618 closed Mar 4, 2025
ERROR: Video features and Video Tokens do not match!!!
#35869 closed Mar 4, 2025
tokenizers.apply_chat_template with continue_final_message=True with </think> token
#36440 closed Mar 3, 2025
tokenizers.apply_chat_template with `continue_final_message=True` with trailing spaces in input
#35433 closed Mar 3, 2025
Confusing behavior when loading PEFT models with pipeline
#36473 closed Mar 3, 2025
`_load_state_dict_into_meta_model` - `'NoneType' object has no attribute 'load_state_dict'`
#36495 closed Mar 3, 2025
Failed to import transformers.trainer because of the following error (look up to see its traceback): cannot import name 'add_model_info_to_custom_pipelines' from 'transformers.utils'
#36503 closed Mar 3, 2025
GRPO Reward Weight Scheduler
#36490 closed Mar 3, 2025
please support register_full_backward_pre_hook and register_full_backward_hook
#36507 closed Mar 3, 2025
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed Mar 3, 2025
[DEV Testing] Issues with `test_modeling_common`
#35857 closed Mar 3, 2025
[BUG]npu zero3 训练自定义模型时，报错Function SumBackward0 returned an invalid gradient at index 0
#36387 closed Mar 3, 2025
Load siglip2 error
#36475 closed Mar 3, 2025
`padding_side` is of type `bool` when it should be `Literal['right', 'left']`
#36252 closed Mar 3, 2025
Add Wan model into Transformers
#36494 closed Mar 2, 2025
Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev`
#36441 closed Mar 1, 2025
`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()`
#35426 closed Mar 1, 2025
AttributeError: 'Config' object has no attribute '_get_non_default_generation_parameters'
#35543 closed Mar 1, 2025
Prompt_ids feature causing repetitions and hallucinations
#35603 closed Mar 1, 2025
convert_llama_weight_to_hf.py
#35820 closed Mar 1, 2025
suppress_tokens=[] should be legal as some older. whisper models rely on this
#36341 closed Feb 28, 2025
model.generate() produces different outputs with padding for flan-t5-small
#36461 closed Feb 28, 2025
Failed to import transformers.models.auto.modeling_auto because numpy.core.multiarray failed to import
#36343 closed Feb 28, 2025
KerasTensor can't be used with TFBertTokenizer
#36462 closed Feb 28, 2025
BUG: Failed to import transformers.models.auto.modeling_auto due to No module named 'scipy.optimize._highspy._core.simplex_constants';
#36468 closed Feb 28, 2025
about siglip2
#36470 closed Feb 28, 2025
AutoModelForObjectDetection isnt working due to wrong output size
#36464 closed Feb 28, 2025
Question for community: We're considering adding `pydantic` as a base requirement to 🤗 `transformers`
#36329 closed Feb 28, 2025
How to change data
#35807 closed Feb 28, 2025
test
#36471 closed Feb 28, 2025
Error splitting the input into NAL units.
#36448 closed Feb 27, 2025
KTransformers跑DeepSeek-R1量化版，支持多并发吗？多并发要的资源是不是也成倍上涨？
#36423 closed Feb 27, 2025
Mamba2 doesn't support Multi-GPU training (fast path)
#35770 closed Feb 27, 2025
TPU Initialization Error with Transformers in Kaggle TPU VM v3-8
#35774 closed Feb 27, 2025
Apply dualpipe from deepseek-v3 to a trainer or model
#36439 closed Feb 27, 2025

116 Issues opened by 112 people

Gemma3 adding new tokens <image_soft_token> has been added accidentally
#37011 opened Mar 26, 2025
[Question] Handling of custom flex attention block masks
#37006 opened Mar 26, 2025
GGUF model with architecture gemma3 is not supported yet.
#37002 opened Mar 26, 2025
Add ArlowGPT
#36988 opened Mar 26, 2025
FSDP Not Working For Mamba2
#36982 opened Mar 25, 2025
[Community contributions] Model cards
#36979 opened Mar 25, 2025
[Contributions Welcome] Add Fast Image Processors
#36978 opened Mar 25, 2025
QuestionAnswering for Gemma 3
#36977 opened Mar 25, 2025
Gemma3: Cuda error: misaligned address
#36961 opened Mar 25, 2025
Incorrect size mismatch skipping in `_find_mismatched_keys` causes model loading failures despite `ignore_mismatched_sizes=True`
#36960 opened Mar 25, 2025
Symbolic trance with past_key_values input is not supported yet for the qwen2.
#36959 opened Mar 25, 2025
Started getting new warnings for gemma3 after upgrading from 4.49.0-gemma3 to 4.50.0
#36942 opened Mar 24, 2025
Add param_to_hook_all_reduce parameter in HF Trainer
#36941 opened Mar 24, 2025
Gemma3 not supported in main branch
#36940 opened Mar 24, 2025
AttributeError: 'HybridCache' object has no attribute 'float' — PaliGemma2 Evaluation Fails with BF16
#36938 opened Mar 24, 2025
python_interpreter.py seems not support asyncio.run()
#36920 opened Mar 24, 2025
'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50
#36913 opened Mar 24, 2025
`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING`
#36908 opened Mar 23, 2025
PixtralVisionModel does not support Flash Attention 2.0 yet
#36904 opened Mar 22, 2025
Warning: "No label_names provided for PeftModel" persists despite dataset containing "labels" column
#36902 opened Mar 22, 2025
groot n1
#36900 opened Mar 22, 2025
GPT2Model model output inconsistency between different transformers versions
#36897 opened Mar 22, 2025
Forced to hit `UserWarning` when generating with `temperature=0`
#36896 opened Mar 21, 2025
Add RF-DETR model
#36879 opened Mar 21, 2025
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 opened Mar 21, 2025
Support for SpatialLM series model
#36874 opened Mar 21, 2025
Optimize tokenizer.decode() Performance for `List[int]` Inputs
#36872 opened Mar 21, 2025
Multiple processor classes have input side-effects
#36865 opened Mar 20, 2025
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 opened Mar 20, 2025
Facing RunTime Attribute error while running different Flax models for RoFormer
#36854 opened Mar 20, 2025
Tansfomers_model
#36846 opened Mar 20, 2025
Unable to load google/siglip2-so400m-patch14-384/
#36845 opened Mar 20, 2025
GOT-OCR2 docs indicate model can produce markdown, but it only produces LaTeX.
#36836 opened Mar 19, 2025
Build for Windows and VS 2022 does not compile CUDA sources
#36830 opened Mar 19, 2025
Support for Ovis2 models
#36824 opened Mar 19, 2025
Gemma 3 is broken with fp16
#36822 opened Mar 19, 2025
Need Option to Disable Flash Attention in VideoLLaMA2.1-7B-AV (SiglipVisionModel)
#36819 opened Mar 19, 2025
Add EuroBert Model To Config
#36817 opened Mar 19, 2025
Gemma3 can't be fine-tuned on multi-image examples
#36816 opened Mar 19, 2025
Gemma3
#36815 opened Mar 19, 2025
When I use BF16 or FP16 to perform Lora fine-tuning on GemMA-3-12B-it, there will be an error when saving the checkpoint, but FP32 is normal
#36814 opened Mar 19, 2025
Not able to trace GPT2DoubleHeadsModel
#36812 opened Mar 19, 2025
Logic Errors in Image_processing_gemma3_fast.py
#36806 opened Mar 19, 2025
Qwen2VLForConditionalGeneration.from_pretrained() hangs with v0.50.0-dev0
#36803 opened Mar 18, 2025
BERT is broken on `v4.49.0-Gemma-3`
#36802 opened Mar 18, 2025
Throw messages in text-generation task with deepseek r1 with PEFTModel
#36783 opened Mar 18, 2025
Please support GGUF format for UMT5EncoderModel
#36774 opened Mar 17, 2025
Inconsistent Documentation for `⁠dataset_index` Requirement Across ViTPose Models
#36773 opened Mar 17, 2025
Add Audio inputs available in apply_chat_template
#36769 opened Mar 17, 2025
Source link to Ray Tune API outdated
#36765 opened Mar 17, 2025
could not parse ModelProto from /home/imss/zxhhhh/llama-3-8b/tokenizer.model
#36764 opened Mar 17, 2025
Add Gemma 3 For Sequence Classification
#36755 opened Mar 16, 2025
Unable to load google/siglip2-base-patch16-naflex
#36754 opened Mar 16, 2025
IdeficsProcessor cannot handle multiple images in one text
#36751 opened Mar 16, 2025
Gemma 3 1B - TypeError: 'NoneType' object is not callable
#36745 opened Mar 15, 2025
Model Card to include key information (e.g. max_sequence_length, etc.)
#36743 opened Mar 15, 2025
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#36738 opened Mar 15, 2025
Error when tokenizer is set to string: `AttributeError: 'str' object has no attribute 'pad_token_id'`
#36731 opened Mar 14, 2025
`torch.compile` custom backend called by AotAutograd triggers recompiles when used with `CompileConfig`
#36725 opened Mar 14, 2025
trainer.train()
#36723 opened Mar 14, 2025
Add RoMa keypoint matcher
#36718 opened Mar 14, 2025
`return_assistant_tokens_mask` argument is blocked in `ProcessorMixin.apply_chat_template`
#36713 opened Mar 14, 2025
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 opened Mar 13, 2025
Transformers 4.49.0 breaks nvdiffrast plugin loading
#36676 opened Mar 12, 2025
The parameter 'text' may be None as the comments says, there is a confuse.
#36667 opened Mar 12, 2025
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 opened Mar 12, 2025
Qwen2 MoE manual `head_dim`
#36659 opened Mar 12, 2025
Cannot run backward with tensor parallel
#36657 opened Mar 12, 2025
AutoModel from_pretrained does not recursively download relative imports
#36653 opened Mar 12, 2025
Marian RNN conversion support
#36651 opened Mar 11, 2025
Hybrid models
#36646 opened Mar 11, 2025
Is it correct that the repetition penalty is applied to the input_ids encompassing all inputs and outputs, rather than solely on the generated tokens?
#36642 opened Mar 11, 2025
[Feature Request]: refactor _update_causal_mask to a public utility
#36640 opened Mar 11, 2025
[BUG] Batch inference DDP + zero stage 3 = inference code hangs
#36638 opened Mar 11, 2025
`output_hidden_states` only return part of hidden_state when setting `device_map="auto"`
#36636 opened Mar 10, 2025
Difficulties with multi-GPU Inferencing
#36634 opened Mar 10, 2025
Add Magma from Microsoft to Transformers
#36629 opened Mar 10, 2025
Unable to use converted Llama 3.3 instruct model
#36628 opened Mar 10, 2025
[deepspeed] any plans for deepspeed-domino?
#36624 opened Mar 10, 2025
Can not use flash-attention and flash-varlen-attention on Ascend NPU
#36618 opened Mar 9, 2025
In "02_how_to_generate", code cell 1 has an error message
#36613 opened Mar 8, 2025
Not installable on arm64 due to jaxlib upper bound
#36611 opened Mar 7, 2025
Making attention mechanism stackable
#36609 opened Mar 7, 2025
Whisper pipeline returns empty segment for each processed audio chunk
#36602 opened Mar 7, 2025
lm_head parameters missing from named_parameters() in Qwen2.5-VL-3B-Instruct model
#36598 opened Mar 7, 2025
Error when changing vocab size when fine tuning llama-vision
#36590 opened Mar 6, 2025
After tokenizers upgrade, the length of the token does not correspond to the length of the model
#36574 opened Mar 6, 2025
txt2vedio
#36573 opened Mar 6, 2025
In the latest version of transformers (4.49.0) matrix transformation error is encountered
#36571 opened Mar 6, 2025
torch_dtype is actually used now?
#36567 opened Mar 5, 2025
Add support for StableAdamW optimizer in Trainer
#36564 opened Mar 5, 2025
Stop output to stdout in streamers.py methods
#36562 opened Mar 5, 2025
Improving expected test results
#36561 opened Mar 5, 2025
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 opened Mar 5, 2025
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 opened Mar 5, 2025
Facing issue while getting model from Rag,pretrained
#36548 opened Mar 5, 2025
Wrong dependency: `"tensorflow-text<2.16"`
#36541 opened Mar 4, 2025
Bug when computing positional IDs from embeddings
#36537 opened Mar 4, 2025
Bug in LlaveNextProcessor when using do_pad=False
#36531 opened Mar 4, 2025
Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining
#36527 opened Mar 4, 2025
GraniteMoE’s implementation is not compatible with HF’s peft
#36518 opened Mar 3, 2025
Object detection tutorial uses buggy dataset, may lead to crash during training
#36516 opened Mar 3, 2025
Allow parameters of the ViTPooler to be configurable, with the default values set to the current hardcoded values
#36513 opened Mar 3, 2025
model.generate function is not compatible with custom position_ids
#36510 opened Mar 3, 2025
Can not use prompt tuning inference
#36509 opened Mar 3, 2025
haggingface model unsupported torch backward hook
#36508 opened Mar 3, 2025
add a param to control cache in streamer when return output
#36505 opened Mar 3, 2025
TypeError: object of type 'IterableDataset' has no len()
#36501 opened Mar 3, 2025
Support Distill Depth Anything
#36499 opened Mar 2, 2025
Error at scatter num_items_in_batch in ddp/dp
#36492 opened Mar 2, 2025
llama code break with torch compile
#36484 opened Mar 1, 2025
Add type checking to CI
#36481 opened Feb 28, 2025
Enhance the memory efficiency of loading large models (400B) to prevent out-of-memory errors when using tensor parallelism.
#36467 opened Feb 27, 2025
ViTPose tutorial fails
#36454 opened Feb 27, 2025
Error in tiktoken integration example
#36438 opened Feb 26, 2025
Unable to save model after training with tensor parallel
#36436 opened Feb 26, 2025

176 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Janus model
#36053 commented on Mar 20, 2025 • 144 new comments
Add EfficientLoFTR model
#36355 commented on Mar 19, 2025 • 75 new comments
Add FAST
#35476 commented on Mar 26, 2025 • 75 new comments
Samhq model addition
#35147 commented on Mar 26, 2025 • 39 new comments
Add MLCD model
#36182 commented on Mar 25, 2025 • 33 new comments
Add TimesFM Time Series Forecasting Model
#34082 commented on Mar 25, 2025 • 29 new comments
Add StyleTTS 2
#35790 commented on Mar 19, 2025 • 27 new comments
Add evolla rebase main
#36232 commented on Mar 24, 2025 • 19 new comments
Add DeepSeek V2 Model into Transformers
#36400 commented on Mar 26, 2025 • 19 new comments
Add InternVL (2.5 MPO)
#35968 commented on Mar 25, 2025 • 17 new comments
Update deprecated/unused dependencies 🧹 🧹
#36419 commented on Mar 4, 2025 • 14 new comments
Add padding-free to bamba
#35861 commented on Mar 14, 2025 • 13 new comments
Add support for MiniMax's MiniMax-Text-01
#35831 commented on Mar 20, 2025 • 13 new comments
Add index selection for `output_hidden_states`
#33705 commented on Mar 18, 2025 • 13 new comments
`GPT2Model` StaticCache support
#35761 commented on Mar 26, 2025 • 12 new comments
Add Doge model
#35891 commented on Mar 22, 2025 • 12 new comments
Introduce modular files for speech models
#35902 commented on Mar 21, 2025 • 10 new comments
Add D-FINE Model into Transformers
#36261 commented on Mar 26, 2025 • 6 new comments
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on Mar 26, 2025 • 6 new comments
make `num_items_in_batch` optional in compute_loss_func
#36426 commented on Mar 26, 2025 • 5 new comments
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on Mar 24, 2025 • 5 new comments
Add Segment Anything 2 (SAM2)
#32317 commented on Mar 25, 2025 • 4 new comments
switch from `training_args.bin` `training_args.json`
#35010 commented on Mar 10, 2025 • 4 new comments
Add internlm3 dense
#35694 commented on Mar 19, 2025 • 4 new comments
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on Mar 20, 2025 • 3 new comments
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on Mar 25, 2025 • 3 new comments
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on Mar 25, 2025 • 2 new comments
Add LightGlue model
#31718 commented on Mar 10, 2025 • 2 new comments
Introduce numpy/numba optimization to `Qwen2VLImageProcessor`
#36356 commented on Feb 28, 2025 • 2 new comments
Append best model checkpoint with active adapter when not default
#36201 commented on Mar 13, 2025 • 2 new comments
Flash Attention v3
#36190 commented on Mar 24, 2025 • 1 new comment
fix: dtype might change during resize
#36089 commented on Feb 27, 2025 • 1 new comment
[Whisper] Pipeline: handle long form generation
#35750 commented on Mar 13, 2025 • 1 new comment
[WIP]: Base multimodal model for VLLM's `transformers` backend
#36367 commented on Mar 26, 2025 • 1 new comment
Integrate xlstm cleanly.
#35377 commented on Mar 25, 2025 • 1 new comment
[i18n-zh] Translating kv_cache into zh-hans
#36412 commented on Feb 27, 2025 • 1 new comment
enable tp on CPU
#36299 commented on Mar 26, 2025 • 1 new comment
Add AIMv2 to Transformers
#35550 commented on Mar 3, 2025 • 0 new comments
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on Mar 24, 2025 • 0 new comments
[Fix]Integrate LLM generation parameters into the evaluation method
#36416 commented on Feb 27, 2025 • 0 new comments
Bart: new cache format
#35314 commented on Mar 14, 2025 • 0 new comments
[generation] Support cache-cropping methods
#35591 commented on Mar 11, 2025 • 0 new comments
🔴 Video processors as a separate class
#35206 commented on Mar 3, 2025 • 0 new comments
Add Relation DETR
#34900 commented on Mar 20, 2025 • 0 new comments
Bye bye env vars, keep everything as configs
#34886 commented on Mar 19, 2025 • 0 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Mar 21, 2025 • 0 new comments
[`AutoDocstring`] Based on inspect parsing of the signature
#33771 commented on Mar 17, 2025 • 0 new comments
#33512 handle last element out of range error
#33625 commented on Mar 12, 2025 • 0 new comments
Update from pretrained error when loading
#33380 commented on Mar 24, 2025 • 0 new comments
Adding new zero-shot examples
#32483 commented on Feb 28, 2025 • 0 new comments
Skip non-selected experts for mixtral and qwen2_moe
#32429 commented on Mar 11, 2025 • 0 new comments
Trainer: add predict with generate
#32346 commented on Mar 24, 2025 • 0 new comments
Improve support for image generation with Chameleon & Anole
#32013 commented on Mar 19, 2025 • 0 new comments
Support Kosmos-2.5
#31711 commented on Mar 25, 2025 • 0 new comments
audio pipeline support for initial_prompt?
#27317 commented on Mar 26, 2025 • 0 new comments
warning bug in Qwen2DecoderLayer in transformers ==4.49
#36361 commented on Mar 26, 2025 • 0 new comments
The arguments in `utils/modular_model_converter.py` is different from those in docs
#36362 commented on Mar 26, 2025 • 0 new comments
目前使用Ktransformers进行DEEPSEEK-R1满血版和4bit量化版模型进行推理，推理速度有多少tokens/s？对应的计算资源配置分别是多少？
#36363 commented on Mar 26, 2025 • 0 new comments
Support set non_blocking=True when move data from cpu to gpu
#36408 commented on Feb 28, 2025 • 0 new comments
[generate] Run custom generation code from the Hub
#36405 commented on Feb 27, 2025 • 0 new comments
Handle DAC conversion when using weight_norm with newer PyTorch versions
#36393 commented on Mar 2, 2025 • 0 new comments
Added Cosmos model files
#36389 commented on Feb 27, 2025 • 0 new comments
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on Mar 21, 2025 • 0 new comments
chore(qwen2): display warning log only when sliding window attention …
#36316 commented on Mar 8, 2025 • 0 new comments
[`ModernBERT`] Never save 'reference_compile' config; should be set based on end user
#36305 commented on Mar 20, 2025 • 0 new comments
Update composition flag usage
#36263 commented on Mar 19, 2025 • 0 new comments
Add support for DeepseekAI's DeepseekVL
#36248 commented on Mar 26, 2025 • 0 new comments
Improvements in attention_forward functions
#36218 commented on Mar 14, 2025 • 0 new comments
Fix the eval_use_gather_object flag usage
#36214 commented on Mar 18, 2025 • 0 new comments
fix: condition bos_token_id and space as token
#36211 commented on Mar 19, 2025 • 0 new comments
Fixed dynamic module import when there is more than one dot in class …
#36198 commented on Mar 24, 2025 • 0 new comments
(ugly) Use `parallelism=4` for `check_repository_consistency`
#36197 commented on Mar 20, 2025 • 0 new comments
Set evaluation and checkpointing defaults to 'epoch' and reduce loggi…
#36133 commented on Mar 13, 2025 • 0 new comments
Add Phi-3.5-vision
#36036 commented on Mar 14, 2025 • 0 new comments
[ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification
#35991 commented on Mar 7, 2025 • 0 new comments
Idefics: remove double BOS token
#35950 commented on Mar 11, 2025 • 0 new comments
[ModernBERT] Add CausalLM functionality to ModernBERT
#35946 commented on Mar 3, 2025 • 0 new comments
[WIP] add deepseek-v3
#35926 commented on Mar 26, 2025 • 0 new comments
Missing weights not initialized properly #35437
#35913 commented on Feb 27, 2025 • 0 new comments
Several fixes related to rotary position embeddings
#35901 commented on Mar 19, 2025 • 0 new comments
Adds GGUF support for Gemma models
#35887 commented on Mar 4, 2025 • 0 new comments
Add MultipleChoice & QuestionAnswering heads to ModernBERT
#35825 commented on Mar 3, 2025 • 0 new comments
Remove head mask in generative models
#35786 commented on Mar 19, 2025 • 0 new comments
Add ColQwen2 to 🤗 transformers
#35778 commented on Mar 26, 2025 • 0 new comments
fix immediate quantization of the first token in QuantizedCache
#35760 commented on Mar 20, 2025 • 0 new comments
Pipeline: fix unnecessary warnings
#35753 commented on Mar 17, 2025 • 0 new comments
Mask2former & Maskformer Fast Image Processor
#35685 commented on Mar 7, 2025 • 0 new comments
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 commented on Mar 20, 2025 • 0 new comments
Problem about using mBART50 for Russian to Chinese translation
#13116 commented on Mar 7, 2025 • 0 new comments
LayerDrop broken in various Flax models (Whisper/BART/more...)
#35468 commented on Mar 8, 2025 • 0 new comments
`Llama-3.2-11B-Vision-Instruct` (`mllama`) FSDP fails if grad checkpointing is enabled
#36040 commented on Mar 8, 2025 • 0 new comments
DeepSeek V3 Support
#35425 commented on Mar 8, 2025 • 0 new comments
Unknown quantization type, got fp8
#35471 commented on Mar 9, 2025 • 0 new comments
Missing weights are not properly initialized when using model.from_pretrained()
#35437 commented on Mar 9, 2025 • 0 new comments
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on Mar 10, 2025 • 0 new comments
Inconsistent saving of tokenizer with custom code from HF hub vs. local directory
#35597 commented on Mar 10, 2025 • 0 new comments
Mask2FormerImageProcessor support overlapping features
#35536 commented on Mar 11, 2025 • 0 new comments
Add the support for deepseek architecture .gguf
#36144 commented on Mar 13, 2025 • 0 new comments
FSDP Torch XLA vs. FSDPv2 (SMPD) Torch XLA checkpoint saving bug
#36004 commented on Mar 13, 2025 • 0 new comments
`trainer.evaluate` always creates a new MLFlow run, separate from the one used during `train()`
#35074 commented on Mar 13, 2025 • 0 new comments
A word-level timestamps on whisper generation pipeline is mismatched to total duration
#36228 commented on Mar 13, 2025 • 0 new comments
Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading
#36272 commented on Mar 13, 2025 • 0 new comments
XLA FSDP V2 + TPU + T5 Family Models doesn't work
#35142 commented on Mar 13, 2025 • 0 new comments
oom when using adafactor optimizer in deepspeed
#33290 commented on Mar 13, 2025 • 0 new comments
Request to add DINO object detector
#36205 commented on Mar 14, 2025 • 0 new comments
TypeError: ModernBertModel.forward() got an unexpected keyword argument 'num_items_in_batch'
#36074 commented on Mar 14, 2025 • 0 new comments
WhisperForCTC
#26242 commented on Mar 14, 2025 • 0 new comments
Add support for context parallelism
#35983 commented on Mar 14, 2025 • 0 new comments
denoising with sentence permutation, and language sampling
#11129 commented on Mar 15, 2025 • 0 new comments
Jitter Noise added to input being passed to experts in Switch Transformers
#33969 commented on Mar 15, 2025 • 0 new comments
Support sliding_window for sdpa in qwen2
#36351 commented on Feb 27, 2025 • 0 new comments
Add cosmos from Nvidia
#35565 commented on Feb 27, 2025 • 0 new comments
add Flash Attention Support for Helsinki-NLP/opus models
#36169 commented on Feb 28, 2025 • 0 new comments
[Feature Request] We might need a function to change the sampler used in trainer dataloader
#26802 commented on Feb 28, 2025 • 0 new comments
Error From BitsandBytes
#36371 commented on Feb 28, 2025 • 0 new comments
Export to ExecuTorch
#32253 commented on Mar 1, 2025 • 0 new comments
Speed up image processors - cast to array before BatchFeature
#31205 commented on Mar 2, 2025 • 0 new comments
Support SDPA & Flash Attention 2 for LayoutLMv3
#35467 commented on Mar 2, 2025 • 0 new comments
Support H100 training with FP8 in Trainer and Deepspeed
#25333 commented on Mar 2, 2025 • 0 new comments
SAM mask-generation - crops_n_layers
#35530 commented on Mar 3, 2025 • 0 new comments
_batch_encode_plus() got an unexpected keyword argument 'is_pretokenized' using BertTokenizerFast
#17488 commented on Mar 4, 2025 • 0 new comments
Add EVEv2 : an Encoder-free VLM
#36379 commented on Mar 4, 2025 • 0 new comments
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 commented on Mar 4, 2025 • 0 new comments
Add `Tensor Parallel` support for ALL models
#34789 commented on Mar 4, 2025 • 0 new comments
Possible bug when using cosine lr scheduler with gradient accumulation
#35484 commented on Mar 4, 2025 • 0 new comments
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 commented on Mar 4, 2025 • 0 new comments
Add support for Molmo
#33710 commented on Mar 4, 2025 • 0 new comments
The output tensor's data type is not torch.long when the input text is empty.
#36277 commented on Mar 4, 2025 • 0 new comments
Is T5 model supported with HQQ quantization ? (AttributeError: 'HQQLinear' object has no attribute 'weight')
#36254 commented on Mar 4, 2025 • 0 new comments
redirect logging output to `stdout` instead of `stderr`
#34613 commented on Mar 6, 2025 • 0 new comments
Enable Quantize KV Cache for Mistral Model
#35041 commented on Mar 7, 2025 • 0 new comments
llama `tie_word_embeddings` ignored on cpu and with auto dtype only
#33689 commented on Mar 7, 2025 • 0 new comments
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on Mar 22, 2025 • 0 new comments
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on Mar 22, 2025 • 0 new comments
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on Mar 23, 2025 • 0 new comments
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on Mar 23, 2025 • 0 new comments
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on Mar 23, 2025 • 0 new comments
tensor parallel training bug
#36296 commented on Mar 23, 2025 • 0 new comments
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on Mar 23, 2025 • 0 new comments
Add argument to set number of eval steps in Trainer
#31561 commented on Mar 24, 2025 • 0 new comments
Assisted generation slower than with base model alone
#36337 commented on Mar 24, 2025 • 0 new comments
Unable to use Seq2SeqTrainingArguments and Seq2SeqTrainer
#36330 commented on Mar 24, 2025 • 0 new comments
Whisper word-level timestamp extraction fails with beam search
#36093 commented on Mar 24, 2025 • 0 new comments
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on Mar 24, 2025 • 0 new comments
Inference with FSDP during training affects checkpoints
#34530 commented on Mar 24, 2025 • 0 new comments
Mask2Former _init_weights
#35877 commented on Mar 24, 2025 • 0 new comments
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on Mar 25, 2025 • 0 new comments
ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect
#36350 commented on Mar 25, 2025 • 0 new comments
Accelerate x Trainer issue tracker:
#33345 commented on Mar 25, 2025 • 0 new comments
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on Mar 25, 2025 • 0 new comments
multi-gpu: test_model_parallel_beam_search tests fail with "IndexError: list index out of range"
#35824 commented on Mar 25, 2025 • 0 new comments
`Helsinki-NLP/opus-mt-it-en` isn't on HuggingFace Hub
#26382 commented on Mar 25, 2025 • 0 new comments
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on Mar 25, 2025 • 0 new comments
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 commented on Mar 25, 2025 • 0 new comments
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on Mar 16, 2025 • 0 new comments
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on Mar 16, 2025 • 0 new comments
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on Mar 17, 2025 • 0 new comments
MultiTask Classification and label_names on Trainer
#33193 commented on Mar 17, 2025 • 0 new comments
SDPA `is_causal=False` has no effect due to `LlamaModel._prepare_4d_causal_attention_mask_with_cache_position`
#36150 commented on Mar 17, 2025 • 0 new comments
Model trained with Flash Attention 2.0 raises "RuntimeError: query and key must have the same dtype" when generating
#30019 commented on Mar 18, 2025 • 0 new comments
Tensor size mismatch when trying to run RT-DETR on multiple gpus
#33165 commented on Mar 18, 2025 • 0 new comments
Custom 4D tensor caused shape mismatch error
#35290 commented on Mar 18, 2025 • 0 new comments
Cryptic error when using AutoTokenizer with SentencePiece tokenizers without sentencepiece installed
#36291 commented on Mar 19, 2025 • 0 new comments
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on Mar 19, 2025 • 0 new comments
Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`
#35765 commented on Mar 19, 2025 • 0 new comments
`AutoModelForCasualLM.from_pretrained()` exits without warning/error
#36245 commented on Mar 19, 2025 • 0 new comments
IsADirectoryError when training with tqdm enabled for trainer
#34766 commented on Mar 20, 2025 • 0 new comments
Incompatibility in flash_attention_2 + Llama + Transformers>=4.43 + Autocast to fp16
#36224 commented on Mar 20, 2025 • 0 new comments
modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
#36071 commented on Mar 20, 2025 • 0 new comments
ValueError: Unrecognized image processor in Qwen/Qwen2.5-VL-3B-Instruct.
#36193 commented on Mar 21, 2025 • 0 new comments
cannot import name 'is_timm_config_dict' from 'transformers.utils.generic'
#36068 commented on Mar 21, 2025 • 0 new comments
Community contribution: Adding GGUF support for more architectures
#33260 commented on Mar 21, 2025 • 0 new comments
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on Mar 21, 2025 • 0 new comments
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on Mar 22, 2025 • 0 new comments
past_key_value(s) name inconsistency causing problems
#36290 commented on Mar 22, 2025 • 0 new comments
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on Mar 22, 2025 • 0 new comments