-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
2 Releases published by 1 person
-
v4.50.1 Patch release v4.50.1
published
Mar 25, 2025 -
v4.50.2 Patch release v4.50.2
published
Mar 27, 2025
80 Pull requests merged by 39 people
-
fix tied weigths isuue
#37031 merged
Mar 28, 2025 -
[WIP] add deepseek-v3
#35926 merged
Mar 28, 2025 -
[blip-2] Fix dtype mismatch when keep in fp32
#37068 merged
Mar 28, 2025 -
Change deprecated PT functions
#37041 merged
Mar 28, 2025 -
Fix some typos about benchmark scripts.
#37027 merged
Mar 28, 2025 -
Use
lru_cache
for tokenization tests#36818 merged
Mar 28, 2025 -
fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id'
#37026 merged
Mar 28, 2025 -
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0)
#36891 merged
Mar 28, 2025 -
fix: Fully remove legacy cache from Llama
#36958 merged
Mar 27, 2025 -
fixed typo
#37036 merged
Mar 27, 2025 -
Remove deprecated batch_size parameter
#37007 merged
Mar 27, 2025 -
Replace default split function with jnp.split() in flax models
#37001 merged
Mar 27, 2025 -
Set weights_only in torch.load
#36991 merged
Mar 27, 2025 -
Fix typing for None valued variables
#37004 merged
Mar 27, 2025 -
Avoid unnecessary device operations in loss computing
#36950 merged
Mar 27, 2025 -
clean pipeline question_answering.
#36986 merged
Mar 27, 2025 -
[generate, cache] handle more complex device maps
#37014 merged
Mar 27, 2025 -
[audio utils] fix fft_bin_width computation
#36603 merged
Mar 27, 2025 -
[chat templates} support loading audio from video
#36955 merged
Mar 27, 2025 -
Fixup for distill_any_depth conversion script
#37043 merged
Mar 27, 2025 -
Optimize
to_py_obj
for python-native numeric lists and scalars#36885 merged
Mar 27, 2025 -
fix pegasus init weights and other copied models
#36844 merged
Mar 27, 2025 -
Add Distill Any Depth
#36614 merged
Mar 27, 2025 -
Skip FP8 linear tests
#37008 merged
Mar 27, 2025 -
remove redundant code in trainer
#36994 merged
Mar 27, 2025 -
Mark 2 tests as flaky for now
#37038 merged
Mar 27, 2025 -
[Modeling] Load FP8 safetensors such as DeepSeek
#36828 merged
Mar 27, 2025 -
Fix PixtralProcessor patch_size when spatial_merge_size is used
#37019 merged
Mar 27, 2025 -
Support QuestionAnswering Module for ModernBert based models.
#35566 merged
Mar 26, 2025 -
fix transformers_cli import relative path issue
#36989 merged
Mar 26, 2025 -
[docs] Attention mask image
#36970 merged
Mar 26, 2025 -
Remove deprecated training arguments
#36946 merged
Mar 26, 2025 -
fix typos in the code comments and error messages
#36993 merged
Mar 26, 2025 -
Log the correct learning rate
#36973 merged
Mar 26, 2025 -
Fix device_map check for ggml files
#37003 merged
Mar 26, 2025 -
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.
#36975 merged
Mar 26, 2025 -
Allow easy registration of custom attention functions
#36889 merged
Mar 26, 2025 -
Fix get_device_properties
#36997 merged
Mar 26, 2025 -
Fix Optional type annotation
#36841 merged
Mar 26, 2025 -
Install
networkx==3.2.1
manually in some CircleCI jobs after #36957#37000 merged
Mar 26, 2025 -
Use torch.expm1
#36995 merged
Mar 26, 2025 -
byebye CircleCI TF jobs
#36998 merged
Mar 26, 2025 -
Fix tensor dtype mismatch
#36985 merged
Mar 26, 2025 -
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default
#36307 merged
Mar 25, 2025 -
update bot comment again
#36974 merged
Mar 25, 2025 -
Add ruff target-version
#36971 merged
Mar 25, 2025 -
[docs] Fix image link
#36869 merged
Mar 25, 2025 -
Remove extra tensor clone in PyTorch code
#36748 merged
Mar 25, 2025 -
update
#36972 merged
Mar 25, 2025 -
Updated docker files to use
uv
for installing packages#36957 merged
Mar 25, 2025 -
typo fixed in README_fr.md
#36951 merged
Mar 25, 2025 -
Change GPUS to GPUs
#36945 merged
Mar 25, 2025 -
Update after #36962
#36965 merged
Mar 25, 2025 -
Update ruff to
0.11.2
#36962 merged
Mar 25, 2025 -
[Utils] torch version checks optionally accept dev versions
#36847 merged
Mar 25, 2025 -
Fix cuda index issue in cache allocator
#36937 merged
Mar 25, 2025 -
Support
return_tensors
in audio chat templates#34601 merged
Mar 25, 2025 -
fix typos in the tests directory
#36932 merged
Mar 25, 2025 -
Export for Phi4-mini
#36780 merged
Mar 25, 2025 -
Fixing _pre_quantization_dtype when torch_dtype is None
#36930 merged
Mar 25, 2025 -
Add Phi4 multimodal
#36939 merged
Mar 25, 2025 -
Deprecate #36741 and map Causal to Conditional
#36917 merged
Mar 25, 2025 -
Disallow Offload to disk for gguf files
#36933 merged
Mar 24, 2025 -
Fix processor kwargs qwen2 vl
#36890 merged
Mar 24, 2025 -
Added support for seed in
DataCollatorForWholeWordMask
#36903 merged
Mar 24, 2025 -
More precise comment
#36935 merged
Mar 24, 2025 -
Fix pytorch defomr attn path
#36923 merged
Mar 24, 2025 -
[2/N] Use pyupgrade --py39-plus to improve code
#36857 merged
Mar 24, 2025 -
Update
trainer_pt_utils.py
docstrings for consistency#36912 merged
Mar 24, 2025 -
Fix typos
#36910 merged
Mar 24, 2025 -
Use another repo. for Mistral3 processor testing
#36925 merged
Mar 24, 2025 -
Fix Compressed tensors to_dict_diff
#36922 merged
Mar 24, 2025 -
[chameleon] fix num image token check
#36918 merged
Mar 24, 2025 -
tests: fix asyncio.wait() usage for python>=3.11
#36898 merged
Mar 24, 2025 -
[Fix] Add
original_max_position_embeddings
to YARN rope_scaling optional keys#36877 merged
Mar 24, 2025 -
Fix torch version guard at import
#36907 merged
Mar 24, 2025 -
fix Gemma3 Config
#36893 merged
Mar 24, 2025 -
Update installation.md
#36826 merged
Mar 21, 2025 -
[docs] Model docs
#36469 merged
Mar 21, 2025 -
Fix Pan and Scan on batched images Gemma3
#36864 merged
Mar 21, 2025
76 Pull requests opened by 55 people
-
[WIP] Computer vision util: vision visualizer
#36892 opened
Mar 21, 2025 -
Enable tracing for Moshi
#36894 opened
Mar 21, 2025 -
Add RF-DETR
#36895 opened
Mar 21, 2025 -
Adding ArlowGPT
#36899 opened
Mar 22, 2025 -
Add NGen3
#36901 opened
Mar 22, 2025 -
LogfireCallback: Integrating Logfire with Hugging Face’s Trainer
#36905 opened
Mar 22, 2025 -
fix cached file error when repo type is dataset
#36909 opened
Mar 23, 2025 -
Limit number of evaluation samples processed during training
#36916 opened
Mar 24, 2025 -
[qwen2-audio] remove default template
#36919 opened
Mar 24, 2025 -
Allow disabling `deformable_detr` kernels
#36927 opened
Mar 24, 2025 -
Remove the redundant shift during the loss computation in the Moshi m…
#36928 opened
Mar 24, 2025 -
Aligning modling code for GPT2 to work with vLLM (fallback)
#36934 opened
Mar 24, 2025 -
[3/N] Use pyupgrade --py39-plus to improve code
#36936 opened
Mar 24, 2025 -
Static cache should support indexing
#36943 opened
Mar 24, 2025 -
Improve typing in TrainingArgument
#36944 opened
Mar 25, 2025 -
fix(qwen): fix shape error when using tp
#36947 opened
Mar 25, 2025 -
Update image_processing_qwen2_vl.py。fix bug.
#36948 opened
Mar 25, 2025 -
Added Sapnous Architecture
#36952 opened
Mar 25, 2025 -
Skip code `307` in `RequestCounter`
#36953 opened
Mar 25, 2025 -
Remove low_cpu_mem_usage and _fast_init
#36963 opened
Mar 25, 2025 -
More ReDOS fixes!
#36964 opened
Mar 25, 2025 -
[phi-4] use mel filters from audio utils
#36966 opened
Mar 25, 2025 -
Add new dim to `num_items_in_batch` if necessary
#36967 opened
Mar 25, 2025 -
Make executorch integration more seamless by analyzing model signature
#36969 opened
Mar 25, 2025 -
Refactor image processor phi4
#36976 opened
Mar 25, 2025 -
Add device workaround for int4 weight only quantization after API update
#36980 opened
Mar 25, 2025 -
Refactor attention for SigLIP based models
#36981 opened
Mar 25, 2025 -
fix comment misdirection during scaling loss
#36987 opened
Mar 26, 2025 -
Gaudi: Fix the pipeline failed issue with hpu device
#36990 opened
Mar 26, 2025 -
fix and enhance pipeline_webserver.md
#36992 opened
Mar 26, 2025 -
[Phi4] add multimodal chat template
#36996 opened
Mar 26, 2025 -
Add Fast SamImageProcessor
#36999 opened
Mar 26, 2025 -
[Fast Processor] BEiT
#37005 opened
Mar 26, 2025 -
Export Whisper to ExecuTorch
#37009 opened
Mar 26, 2025 -
Fix AttentionInterface following feedback
#37010 opened
Mar 26, 2025 -
Add Fast Chinese-CLIP Processor
#37012 opened
Mar 26, 2025 -
Add args support for fast image processors
#37018 opened
Mar 26, 2025 -
Add support for fast image processing in image-pretraining example
#37021 opened
Mar 26, 2025 -
Add py.typed
#37022 opened
Mar 27, 2025 -
Add Fast Image Processor for Video-LLaVA
#37023 opened
Mar 27, 2025 -
Add Fast Segformer Processor
#37024 opened
Mar 27, 2025 -
fix best_model_checkpoint is None issue when distiributed training
#37025 opened
Mar 27, 2025 -
add gpt2 test on XPU
#37028 opened
Mar 27, 2025 -
[tests] remove cuda-only test marker
#37032 opened
Mar 27, 2025 -
🔴 [VLM] Add base model without head
#37033 opened
Mar 27, 2025 -
Fix torchao usage
#37034 opened
Mar 27, 2025 -
Support passing flash_attn_kwargs when gradient_checkpointing is enabled
#37037 opened
Mar 27, 2025 -
Add pyupdate to ruff rules
#37039 opened
Mar 27, 2025 -
Updated the model card for CLIP
#37040 opened
Mar 27, 2025 -
[Cache] rename dtype attribute 🚨 🚨
#37044 opened
Mar 27, 2025 -
Add Fast Image Processor for Idefics3
#37045 opened
Mar 27, 2025 -
:rotating_light: :rotating_light: :rotating_light: No more pointing at remote repos
#37047 opened
Mar 27, 2025 -
Adding a stub for MiniCPM-o to the models
#37049 opened
Mar 27, 2025 -
Update Model Card for ModernBERT
#37052 opened
Mar 27, 2025 -
Add Idefics2 Fast ImageProcessor
#37053 opened
Mar 27, 2025 -
(Part 2) feat: allow for tp_size attr for tplizing the model
#37054 opened
Mar 27, 2025 -
Add EfficientNet Image PreProcessor
#37055 opened
Mar 27, 2025 -
Update model card for Cohere
#37056 opened
Mar 28, 2025 -
fixed typo.
#37057 opened
Mar 28, 2025 -
Remove deprecated code
#37059 opened
Mar 28, 2025 -
Fix more inefficient PT operations
#37060 opened
Mar 28, 2025 -
Add weights_only=True to torch.load
#37062 opened
Mar 28, 2025 -
Update model card for electra
#37063 opened
Mar 28, 2025 -
Update model card for Depth Anything
#37065 opened
Mar 28, 2025 -
Fix 4090/ada not detected as having FP8 support
#37067 opened
Mar 28, 2025 -
🌐 [i18n-KO] Translated `roberta.md` to Korean
#37069 opened
Mar 28, 2025 -
Detect and fix all `_init_weights()` issues
#37070 opened
Mar 28, 2025 -
Add Fast Conditional-DETR Processor
#37071 opened
Mar 28, 2025 -
Reverse dependency map shouldn't be created when test_all is set
#37072 opened
Mar 28, 2025 -
Improvements in Gemma2 model card
#37076 opened
Mar 28, 2025 -
Fix: Unexpected Keys, Improve `run_compressed`, Rename Test Folder
#37077 opened
Mar 28, 2025 -
[generate] beam search -- fix output cropping
#37080 opened
Mar 28, 2025 -
Add Fast Image Processor for Donut
#37081 opened
Mar 28, 2025 -
[draft] random tests order
#37082 opened
Mar 28, 2025 -
Test cleanup
#37083 opened
Mar 28, 2025
43 Issues closed by 14 people
-
llama `tie_word_embeddings` ignored on cpu and with auto dtype only
#33689 closed
Mar 28, 2025 -
DeepSeek V3 Support
#35425 closed
Mar 28, 2025 -
Incompatibility in flash_attention_2 + Llama + Transformers>=4.43 + Autocast to fp16
#36224 closed
Mar 28, 2025 -
omlab/omdet-turbo-swin-tiny-hf from_pretrained fails to build model
#37016 closed
Mar 27, 2025 -
Optimize tokenizer.decode() Performance for `List[int]` Inputs
#36872 closed
Mar 27, 2025 -
Support Distill Depth Anything
#36499 closed
Mar 27, 2025 -
About Siglip2 feature selection
#36382 closed
Mar 27, 2025 -
TypeError: llama_flash_attn_forward() got an unexpected keyword argument 'cache_position'
#37030 closed
Mar 27, 2025 -
python_interpreter.py seems not support asyncio.run()
#36920 closed
Mar 27, 2025 -
Learning rate logging off by one training step
#35942 closed
Mar 26, 2025 -
ValueError: `run_compressed` is only supported for quantized_compressed models
#36915 closed
Mar 26, 2025 -
Recent update: configuration_eurobert.py not found -
#36983 closed
Mar 26, 2025 -
Issue with Progressive Generation Using inputs_embeds and past_key_values
#35707 closed
Mar 26, 2025 -
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 closed
Mar 26, 2025 -
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed
Mar 26, 2025 -
Token healing throws error with "Qwen/Qwen2.5-Coder-7B-Instruct"
#36210 closed
Mar 26, 2025 -
[bug] use_gather_object is not respected after the first eval in trainer
#36213 closed
Mar 26, 2025 -
Error: TypeError: argument 'ids': 'float' object cannot be interpreted as an integer
#36984 closed
Mar 26, 2025 -
Clarification on Commercial License Impact of LayoutLMv3ImageProcessor within UdopProcessor
#36931 closed
Mar 25, 2025 -
ImportError: cannot import name 'AdamW' from 'transformers'
#36954 closed
Mar 25, 2025 -
AutoTokenizer/Processor does not work with Mistral3 models
#36968 closed
Mar 25, 2025 -
Ruff update
#36705 closed
Mar 25, 2025 -
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 closed
Mar 25, 2025 -
`Mllama` not supported by `AutoModelForCausalLM` after updating `transformers` to `4.50.0`
#36926 closed
Mar 25, 2025 -
Florence2 stopped working after upgrade to 4.50.0 ("Unrecognized configuration class")
#36886 closed
Mar 25, 2025 -
Design question for integrating new model to Transformers?
#36784 closed
Mar 25, 2025 -
Add seed to data collator classes
#36655 closed
Mar 24, 2025 -
Torch -> ONNX doesn't work after upgrading transformers to 4.49.0
#36276 closed
Mar 24, 2025 -
<spam>
#36924 closed
Mar 24, 2025 -
llama tokenizer encode -> decode is not same
#36325 closed
Mar 24, 2025 -
tj-actions/changed-files action compromised
#36761 closed
Mar 24, 2025 -
Some of test/utils tests fail being invalidated by tests/utils/test_import_utils.py::test_clear_import_cache
#36334 closed
Mar 24, 2025 -
MacOs: register_pytree_node got an unexpected keyword argument 'flatten_with_keys_fn'
#36906 closed
Mar 24, 2025 -
Issue with update
#36888 closed
Mar 24, 2025 -
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
#35612 closed
Mar 24, 2025 -
Pipeline cannot guess which processor to use with Gemma 3
#36911 closed
Mar 23, 2025 -
Unable to export GLM models to ONNX
#35021 closed
Mar 23, 2025 -
`modular_model_converter` can not handle objects import via try - except
#35414 closed
Mar 23, 2025 -
`TFViTModel` and `interpolate_pos_encoding=True`
#36155 closed
Mar 23, 2025 -
[BART] Cannot copy out of meta tensor; no data!
#36247 closed
Mar 21, 2025
35 Issues opened by 33 people
-
Need to get hidden features from Siglip but ValueError: You have to specify input_ids
#37079 opened
Mar 28, 2025 -
Do not update cache when use_cache=False and past_key_values are provided?
#37078 opened
Mar 28, 2025 -
A TypeError in modeling_utils.caching_allocator_warmup function
#37074 opened
Mar 28, 2025 -
a logic error in _preprocess function of Qwen2VLImageProcessor Class
#37064 opened
Mar 28, 2025 -
AutoTrain Unsloth support
#37050 opened
Mar 27, 2025 -
Persistent generation issues with MT5 models (base and fine-tuned) across environments
#37048 opened
Mar 27, 2025 -
Optionality of `attention_mask` argument in Attention classes/functions.
#37046 opened
Mar 27, 2025 -
Latest TorchAO config breaks serialization
#37035 opened
Mar 27, 2025 -
add MiniCPM-o
#37029 opened
Mar 27, 2025 -
run_mim.py script from image-pretraining example is not working
#37020 opened
Mar 26, 2025 -
Add NeoBERT
#37015 opened
Mar 26, 2025 -
Gemma3 adding new tokens <image_soft_token> has been added accidentally
#37011 opened
Mar 26, 2025 -
[Question] Handling of custom flex attention block masks
#37006 opened
Mar 26, 2025 -
GGUF model with architecture gemma3 is not supported yet.
#37002 opened
Mar 26, 2025 -
Add ArlowGPT
#36988 opened
Mar 26, 2025 -
FSDP Not Working For Mamba2
#36982 opened
Mar 25, 2025 -
[Community contributions] Model cards
#36979 opened
Mar 25, 2025 -
[Contributions Welcome] Add Fast Image Processors
#36978 opened
Mar 25, 2025 -
QuestionAnswering for Gemma 3
#36977 opened
Mar 25, 2025 -
Gemma3: Cuda error: misaligned address
#36961 opened
Mar 25, 2025 -
Symbolic trance with past_key_values input is not supported yet for the qwen2.
#36959 opened
Mar 25, 2025 -
Started getting new warnings for gemma3 after upgrading from 4.49.0-gemma3 to 4.50.0
#36942 opened
Mar 24, 2025 -
Add param_to_hook_all_reduce parameter in HF Trainer
#36941 opened
Mar 24, 2025 -
Gemma3 not supported in main branch
#36940 opened
Mar 24, 2025 -
AttributeError: 'HybridCache' object has no attribute 'float' — PaliGemma2 Evaluation Fails with BF16
#36938 opened
Mar 24, 2025 -
'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50
#36913 opened
Mar 24, 2025 -
`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING`
#36908 opened
Mar 23, 2025 -
PixtralVisionModel does not support Flash Attention 2.0 yet
#36904 opened
Mar 22, 2025 -
Warning: "No label_names provided for PeftModel" persists despite dataset containing "labels" column
#36902 opened
Mar 22, 2025 -
groot n1
#36900 opened
Mar 22, 2025 -
GPT2Model model output inconsistency between different transformers versions
#36897 opened
Mar 22, 2025 -
Forced to hit `UserWarning` when generating with `temperature=0`
#36896 opened
Mar 21, 2025
119 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Qwen2.5-Omni
#36752 commented on
Mar 28, 2025 • 60 new comments -
Add FAST
#35476 commented on
Mar 27, 2025 • 44 new comments -
Add Granite Speech Support
#36801 commented on
Mar 27, 2025 • 33 new comments -
Allow saving and loading multiple "raw" chat template files
#36588 commented on
Mar 28, 2025 • 21 new comments -
Add support for DeepseekAI's DeepseekVL
#36248 commented on
Mar 28, 2025 • 20 new comments -
Add InternVL (2.5 MPO)
#35968 commented on
Mar 27, 2025 • 16 new comments -
Adding Qwen3 and Qwen3MoE
#36878 commented on
Mar 28, 2025 • 16 new comments -
Add DeepSeek V2 Model into Transformers
#36400 commented on
Mar 28, 2025 • 15 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Mar 27, 2025 • 14 new comments -
Add D-FINE Model into Transformers
#36261 commented on
Mar 28, 2025 • 8 new comments -
🔴 Video processors as a separate class
#35206 commented on
Mar 28, 2025 • 6 new comments -
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on
Mar 26, 2025 • 6 new comments -
make `num_items_in_batch` optional in compute_loss_func
#36426 commented on
Mar 26, 2025 • 5 new comments -
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on
Mar 24, 2025 • 5 new comments -
Add Segment Anything 2 (SAM2)
#32317 commented on
Mar 25, 2025 • 4 new comments -
Use public export API on torch 2.5 and future
#36781 commented on
Mar 28, 2025 • 4 new comments -
Improve Model Download Speeds By ~3x For Large Models
#36870 commented on
Mar 25, 2025 • 3 new comments -
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 commented on
Mar 25, 2025 • 3 new comments -
enable tp on CPU
#36299 commented on
Mar 27, 2025 • 1 new comment -
[WIP]: Base multimodal model for VLLM's `transformers` backend
#36367 commented on
Mar 26, 2025 • 1 new comment -
Move `return_dict` logic into `can_return_tuple` decorator
#36838 commented on
Mar 26, 2025 • 1 new comment -
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 commented on
Mar 27, 2025 • 1 new comment -
[generate] Run custom generation code from the Hub
#36405 commented on
Mar 28, 2025 • 1 new comment -
Add evolla rebase main
#36232 commented on
Mar 24, 2025 • 1 new comment -
Flash Attention v3
#36190 commented on
Mar 24, 2025 • 1 new comment -
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on
Mar 21, 2025 • 0 new comments -
Fixed dynamic module import when there is more than one dot in class …
#36198 commented on
Mar 24, 2025 • 0 new comments -
Add MLCD model
#36182 commented on
Mar 25, 2025 • 0 new comments -
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on
Mar 25, 2025 • 0 new comments -
Add Janus model
#36053 commented on
Mar 27, 2025 • 0 new comments -
Introduce modular files for speech models
#35902 commented on
Mar 28, 2025 • 0 new comments -
Add Doge model
#35891 commented on
Mar 22, 2025 • 0 new comments -
Add ColQwen2 to 🤗 transformers
#35778 commented on
Mar 26, 2025 • 0 new comments -
`GPT2Model` StaticCache support
#35761 commented on
Mar 26, 2025 • 0 new comments -
Pipeline: fix unnecessary warnings
#35753 commented on
Mar 27, 2025 • 0 new comments -
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on
Mar 24, 2025 • 0 new comments -
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 commented on
Mar 25, 2025 • 0 new comments -
Only count num items in batch when needed
#36867 commented on
Mar 27, 2025 • 0 new comments -
fix: prevent input side-effects in processor text args
#36866 commented on
Mar 25, 2025 • 0 new comments -
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 commented on
Mar 21, 2025 • 0 new comments -
Dummies
#36827 commented on
Mar 28, 2025 • 0 new comments -
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 commented on
Mar 24, 2025 • 0 new comments -
Add long vita
#36807 commented on
Mar 23, 2025 • 0 new comments -
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 commented on
Mar 24, 2025 • 0 new comments -
Refactor `return_dict` logic to remove complicated if/else paths
#36794 commented on
Mar 28, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 commented on
Mar 24, 2025 • 0 new comments -
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 commented on
Mar 28, 2025 • 0 new comments -
Add CSM model
#36719 commented on
Mar 27, 2025 • 0 new comments -
fix whisper re-compile
#36712 commented on
Mar 28, 2025 • 0 new comments -
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on
Mar 26, 2025 • 0 new comments -
Limit numpy version to <2.0.0
#36706 commented on
Mar 24, 2025 • 0 new comments -
prune LM Head for USD
#36695 commented on
Mar 26, 2025 • 0 new comments -
Support batch size > 1 image-text inference
#36682 commented on
Mar 21, 2025 • 0 new comments -
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 commented on
Mar 25, 2025 • 0 new comments -
Fix device issue in modeling_qwen2
#36647 commented on
Mar 27, 2025 • 0 new comments -
Refine parameter type annotations
#36644 commented on
Mar 25, 2025 • 0 new comments -
[WiP] Add Aimv2 model
#36625 commented on
Mar 27, 2025 • 0 new comments -
[WIP] Add support to load models with transforms
#36621 commented on
Mar 23, 2025 • 0 new comments -
Create and Expose SamVisionModel as public for better accessibility
#36493 commented on
Mar 27, 2025 • 0 new comments -
add FlashAttentionKwargs and seq_idx to flat collator
#36456 commented on
Mar 28, 2025 • 0 new comments -
Add PlainDETR
#36437 commented on
Mar 25, 2025 • 0 new comments -
Accelerate x Trainer issue tracker:
#33345 commented on
Mar 25, 2025 • 0 new comments -
ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect
#36350 commented on
Mar 25, 2025 • 0 new comments -
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on
Mar 25, 2025 • 0 new comments -
Mask2Former _init_weights
#35877 commented on
Mar 24, 2025 • 0 new comments -
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 commented on
Mar 24, 2025 • 0 new comments -
Inference with FSDP during training affects checkpoints
#34530 commented on
Mar 24, 2025 • 0 new comments -
Whisper pipeline returns empty segment for each processed audio chunk
#36602 commented on
Mar 24, 2025 • 0 new comments -
Support for SpatialLM series model
#36874 commented on
Mar 24, 2025 • 0 new comments -
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on
Mar 24, 2025 • 0 new comments -
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 commented on
Mar 24, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
Mar 24, 2025 • 0 new comments -
Unable to use Seq2SeqTrainingArguments and Seq2SeqTrainer
#36330 commented on
Mar 24, 2025 • 0 new comments -
Assisted generation slower than with base model alone
#36337 commented on
Mar 24, 2025 • 0 new comments -
Add argument to set number of eval steps in Trainer
#31561 commented on
Mar 24, 2025 • 0 new comments -
Build for Windows and VS 2022 does not compile CUDA sources
#36830 commented on
Mar 23, 2025 • 0 new comments -
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on
Mar 23, 2025 • 0 new comments -
tensor parallel training bug
#36296 commented on
Mar 23, 2025 • 0 new comments -
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on
Mar 23, 2025 • 0 new comments -
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on
Mar 23, 2025 • 0 new comments -
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on
Mar 23, 2025 • 0 new comments -
Inconsistent Documentation for `dataset_index` Requirement Across ViTPose Models
#36773 commented on
Mar 22, 2025 • 0 new comments -
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 commented on
Mar 22, 2025 • 0 new comments -
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on
Mar 22, 2025 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Mar 22, 2025 • 0 new comments -
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on
Mar 22, 2025 • 0 new comments -
past_key_value(s) name inconsistency causing problems
#36290 commented on
Mar 22, 2025 • 0 new comments -
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on
Mar 22, 2025 • 0 new comments -
Add Magma from Microsoft to Transformers
#36629 commented on
Mar 21, 2025 • 0 new comments -
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on
Mar 21, 2025 • 0 new comments -
Integrate xlstm cleanly.
#35377 commented on
Mar 28, 2025 • 0 new comments -
Samhq model addition
#35147 commented on
Mar 27, 2025 • 0 new comments -
Update from pretrained error when loading
#33380 commented on
Mar 24, 2025 • 0 new comments -
Trainer: add predict with generate
#32346 commented on
Mar 24, 2025 • 0 new comments -
Support Kosmos-2.5
#31711 commented on
Mar 27, 2025 • 0 new comments -
SAM mask-generation - crops_n_layers
#35530 commented on
Mar 28, 2025 • 0 new comments -
When I use BF16 or FP16 to perform Lora fine-tuning on GemMA-3-12B-it, there will be an error when saving the checkpoint, but FP32 is normal
#36814 commented on
Mar 28, 2025 • 0 new comments -
modeling_deformable_detr.py DeformableDetrMultiheadAttention.foward function report error for "hidden_states_original" if position_embeddings is None
#36378 commented on
Mar 27, 2025 • 0 new comments -
Gemma3
#36815 commented on
Mar 27, 2025 • 0 new comments -
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 commented on
Mar 27, 2025 • 0 new comments -
Add Gemma 3 For Sequence Classification
#36755 commented on
Mar 27, 2025 • 0 new comments -
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on
Mar 27, 2025 • 0 new comments -
Gemma3 can't be fine-tuned on multi-image examples
#36816 commented on
Mar 27, 2025 • 0 new comments -
:Cannot infer concrete type of torch.nn.Module
#36370 commented on
Mar 27, 2025 • 0 new comments -
Error From BitsandBytes
#36371 commented on
Mar 27, 2025 • 0 new comments -
Set non_blocking=True When moving data from the CPU to the GPU
#36384 commented on
Mar 27, 2025 • 0 new comments -
Cannot Load moonshotai/Moonlight-16B-A3B
#36385 commented on
Mar 27, 2025 • 0 new comments -
Making attention mechanism stackable
#36609 commented on
Mar 26, 2025 • 0 new comments -
torch_dtype is actually used now?
#36567 commented on
Mar 26, 2025 • 0 new comments -
AutoModel from_pretrained does not recursively download relative imports
#36653 commented on
Mar 26, 2025 • 0 new comments -
audio pipeline support for initial_prompt?
#27317 commented on
Mar 26, 2025 • 0 new comments -
warning bug in Qwen2DecoderLayer in transformers ==4.49
#36361 commented on
Mar 26, 2025 • 0 new comments -
The arguments in `utils/modular_model_converter.py` is different from those in docs
#36362 commented on
Mar 26, 2025 • 0 new comments -
目前使用Ktransformers进行DEEPSEEK-R1满血版和4bit量化版模型进行推理,推理速度有多少tokens/s?对应的计算资源配置分别是多少?
#36363 commented on
Mar 26, 2025 • 0 new comments -
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 commented on
Mar 25, 2025 • 0 new comments -
Stop output to stdout in streamers.py methods
#36562 commented on
Mar 25, 2025 • 0 new comments -
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on
Mar 25, 2025 • 0 new comments -
`Helsinki-NLP/opus-mt-it-en` isn't on HuggingFace Hub
#26382 commented on
Mar 25, 2025 • 0 new comments -
multi-gpu: test_model_parallel_beam_search tests fail with "IndexError: list index out of range"
#35824 commented on
Mar 25, 2025 • 0 new comments