Pulse · huggingface/transformers

March 21, 2025 – March 28, 2025

Overview

156 Active pull requests

78 Active issues

Could not load contribution data

Please try again later

119 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Qwen2.5-Omni
#36752 commented on Mar 28, 2025 • 60 new comments
Add FAST
#35476 commented on Mar 27, 2025 • 44 new comments
Add Granite Speech Support
#36801 commented on Mar 27, 2025 • 33 new comments
Allow saving and loading multiple "raw" chat template files
#36588 commented on Mar 28, 2025 • 21 new comments
Add support for DeepseekAI's DeepseekVL
#36248 commented on Mar 28, 2025 • 20 new comments
Add InternVL (2.5 MPO)
#35968 commented on Mar 27, 2025 • 16 new comments
Adding Qwen3 and Qwen3MoE
#36878 commented on Mar 28, 2025 • 16 new comments
Add DeepSeek V2 Model into Transformers
#36400 commented on Mar 28, 2025 • 15 new comments
Add TimesFM Time Series Forecasting Model
#34082 commented on Mar 27, 2025 • 14 new comments
Add D-FINE Model into Transformers
#36261 commented on Mar 28, 2025 • 8 new comments
🔴 Video processors as a separate class
#35206 commented on Mar 28, 2025 • 6 new comments
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on Mar 26, 2025 • 6 new comments
make `num_items_in_batch` optional in compute_loss_func
#36426 commented on Mar 26, 2025 • 5 new comments
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on Mar 24, 2025 • 5 new comments
Add Segment Anything 2 (SAM2)
#32317 commented on Mar 25, 2025 • 4 new comments
Use public export API on torch 2.5 and future
#36781 commented on Mar 28, 2025 • 4 new comments
Improve Model Download Speeds By ~3x For Large Models
#36870 commented on Mar 25, 2025 • 3 new comments
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 commented on Mar 25, 2025 • 3 new comments
enable tp on CPU
#36299 commented on Mar 27, 2025 • 1 new comment
[WIP]: Base multimodal model for VLLM's `transformers` backend
#36367 commented on Mar 26, 2025 • 1 new comment
Move `return_dict` logic into `can_return_tuple` decorator
#36838 commented on Mar 26, 2025 • 1 new comment
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 commented on Mar 27, 2025 • 1 new comment
[generate] Run custom generation code from the Hub
#36405 commented on Mar 28, 2025 • 1 new comment
Add evolla rebase main
#36232 commented on Mar 24, 2025 • 1 new comment
Flash Attention v3
#36190 commented on Mar 24, 2025 • 1 new comment
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on Mar 21, 2025 • 0 new comments
Fixed dynamic module import when there is more than one dot in class …
#36198 commented on Mar 24, 2025 • 0 new comments
Add MLCD model
#36182 commented on Mar 25, 2025 • 0 new comments
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on Mar 25, 2025 • 0 new comments
Add Janus model
#36053 commented on Mar 27, 2025 • 0 new comments
Introduce modular files for speech models
#35902 commented on Mar 28, 2025 • 0 new comments
Add Doge model
#35891 commented on Mar 22, 2025 • 0 new comments
Add ColQwen2 to 🤗 transformers
#35778 commented on Mar 26, 2025 • 0 new comments
`GPT2Model` StaticCache support
#35761 commented on Mar 26, 2025 • 0 new comments
Pipeline: fix unnecessary warnings
#35753 commented on Mar 27, 2025 • 0 new comments
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on Mar 24, 2025 • 0 new comments
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 commented on Mar 25, 2025 • 0 new comments
Only count num items in batch when needed
#36867 commented on Mar 27, 2025 • 0 new comments
fix: prevent input side-effects in processor text args
#36866 commented on Mar 25, 2025 • 0 new comments
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 commented on Mar 21, 2025 • 0 new comments
Dummies
#36827 commented on Mar 28, 2025 • 0 new comments
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 commented on Mar 24, 2025 • 0 new comments
Add long vita
#36807 commented on Mar 23, 2025 • 0 new comments
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 commented on Mar 24, 2025 • 0 new comments
Refactor `return_dict` logic to remove complicated if/else paths
#36794 commented on Mar 28, 2025 • 0 new comments
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 commented on Mar 24, 2025 • 0 new comments
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 commented on Mar 28, 2025 • 0 new comments
Add CSM model
#36719 commented on Mar 27, 2025 • 0 new comments
fix whisper re-compile
#36712 commented on Mar 28, 2025 • 0 new comments
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on Mar 26, 2025 • 0 new comments
Limit numpy version to <2.0.0
#36706 commented on Mar 24, 2025 • 0 new comments
prune LM Head for USD
#36695 commented on Mar 26, 2025 • 0 new comments
Support batch size > 1 image-text inference
#36682 commented on Mar 21, 2025 • 0 new comments
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 commented on Mar 25, 2025 • 0 new comments
Fix device issue in modeling_qwen2
#36647 commented on Mar 27, 2025 • 0 new comments
Refine parameter type annotations
#36644 commented on Mar 25, 2025 • 0 new comments
[WiP] Add Aimv2 model
#36625 commented on Mar 27, 2025 • 0 new comments
[WIP] Add support to load models with transforms
#36621 commented on Mar 23, 2025 • 0 new comments
Create and Expose SamVisionModel as public for better accessibility
#36493 commented on Mar 27, 2025 • 0 new comments
add FlashAttentionKwargs and seq_idx to flat collator
#36456 commented on Mar 28, 2025 • 0 new comments
Add PlainDETR
#36437 commented on Mar 25, 2025 • 0 new comments
Accelerate x Trainer issue tracker:
#33345 commented on Mar 25, 2025 • 0 new comments
ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect
#36350 commented on Mar 25, 2025 • 0 new comments
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on Mar 25, 2025 • 0 new comments
Mask2Former _init_weights
#35877 commented on Mar 24, 2025 • 0 new comments
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 commented on Mar 24, 2025 • 0 new comments
Inference with FSDP during training affects checkpoints
#34530 commented on Mar 24, 2025 • 0 new comments
Whisper pipeline returns empty segment for each processed audio chunk
#36602 commented on Mar 24, 2025 • 0 new comments
Support for SpatialLM series model
#36874 commented on Mar 24, 2025 • 0 new comments
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on Mar 24, 2025 • 0 new comments
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 commented on Mar 24, 2025 • 0 new comments
Whisper word-level timestamp extraction fails with beam search
#36093 commented on Mar 24, 2025 • 0 new comments
Unable to use Seq2SeqTrainingArguments and Seq2SeqTrainer
#36330 commented on Mar 24, 2025 • 0 new comments
Assisted generation slower than with base model alone
#36337 commented on Mar 24, 2025 • 0 new comments
Add argument to set number of eval steps in Trainer
#31561 commented on Mar 24, 2025 • 0 new comments
Build for Windows and VS 2022 does not compile CUDA sources
#36830 commented on Mar 23, 2025 • 0 new comments
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on Mar 23, 2025 • 0 new comments
tensor parallel training bug
#36296 commented on Mar 23, 2025 • 0 new comments
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on Mar 23, 2025 • 0 new comments
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on Mar 23, 2025 • 0 new comments
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on Mar 23, 2025 • 0 new comments
Inconsistent Documentation for `⁠dataset_index` Requirement Across ViTPose Models
#36773 commented on Mar 22, 2025 • 0 new comments
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 commented on Mar 22, 2025 • 0 new comments
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on Mar 22, 2025 • 0 new comments
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on Mar 22, 2025 • 0 new comments
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on Mar 22, 2025 • 0 new comments
past_key_value(s) name inconsistency causing problems
#36290 commented on Mar 22, 2025 • 0 new comments
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on Mar 22, 2025 • 0 new comments
Add Magma from Microsoft to Transformers
#36629 commented on Mar 21, 2025 • 0 new comments
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on Mar 21, 2025 • 0 new comments
Integrate xlstm cleanly.
#35377 commented on Mar 28, 2025 • 0 new comments
Samhq model addition
#35147 commented on Mar 27, 2025 • 0 new comments
Update from pretrained error when loading
#33380 commented on Mar 24, 2025 • 0 new comments
Trainer: add predict with generate
#32346 commented on Mar 24, 2025 • 0 new comments
Support Kosmos-2.5
#31711 commented on Mar 27, 2025 • 0 new comments
SAM mask-generation - crops_n_layers
#35530 commented on Mar 28, 2025 • 0 new comments
When I use BF16 or FP16 to perform Lora fine-tuning on GemMA-3-12B-it, there will be an error when saving the checkpoint, but FP32 is normal
#36814 commented on Mar 28, 2025 • 0 new comments
modeling_deformable_detr.py DeformableDetrMultiheadAttention.foward function report error for "hidden_states_original" if position_embeddings is None
#36378 commented on Mar 27, 2025 • 0 new comments
Gemma3
#36815 commented on Mar 27, 2025 • 0 new comments
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 commented on Mar 27, 2025 • 0 new comments
Add Gemma 3 For Sequence Classification
#36755 commented on Mar 27, 2025 • 0 new comments
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on Mar 27, 2025 • 0 new comments
Gemma3 can't be fine-tuned on multi-image examples
#36816 commented on Mar 27, 2025 • 0 new comments
:Cannot infer concrete type of torch.nn.Module
#36370 commented on Mar 27, 2025 • 0 new comments
Error From BitsandBytes
#36371 commented on Mar 27, 2025 • 0 new comments
Set non_blocking=True When moving data from the CPU to the GPU
#36384 commented on Mar 27, 2025 • 0 new comments
Cannot Load moonshotai/Moonlight-16B-A3B
#36385 commented on Mar 27, 2025 • 0 new comments
Making attention mechanism stackable
#36609 commented on Mar 26, 2025 • 0 new comments
torch_dtype is actually used now?
#36567 commented on Mar 26, 2025 • 0 new comments
AutoModel from_pretrained does not recursively download relative imports
#36653 commented on Mar 26, 2025 • 0 new comments
audio pipeline support for initial_prompt?
#27317 commented on Mar 26, 2025 • 0 new comments
warning bug in Qwen2DecoderLayer in transformers ==4.49
#36361 commented on Mar 26, 2025 • 0 new comments
The arguments in `utils/modular_model_converter.py` is different from those in docs
#36362 commented on Mar 26, 2025 • 0 new comments
目前使用Ktransformers进行DEEPSEEK-R1满血版和4bit量化版模型进行推理，推理速度有多少tokens/s？对应的计算资源配置分别是多少？
#36363 commented on Mar 26, 2025 • 0 new comments
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 commented on Mar 25, 2025 • 0 new comments
Stop output to stdout in streamers.py methods
#36562 commented on Mar 25, 2025 • 0 new comments
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on Mar 25, 2025 • 0 new comments
`Helsinki-NLP/opus-mt-it-en` isn't on HuggingFace Hub
#26382 commented on Mar 25, 2025 • 0 new comments
multi-gpu: test_model_parallel_beam_search tests fail with "IndexError: list index out of range"
#35824 commented on Mar 25, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 21, 2025 – March 28, 2025

Overview

Could not load contribution data

2 Releases published by 1 person

80 Pull requests merged by 39 people

76 Pull requests opened by 55 people

43 Issues closed by 14 people

35 Issues opened by 33 people

119 Unresolved conversations

Insights: huggingface/transformers

March 21, 2025 – March 28, 2025

Overview

Could not load contribution data

2 Releases published by 1 person

80 Pull requests merged by 39 people

76 Pull requests opened by 55 people

43 Issues closed by 14 people

35 Issues opened by 33 people

119 Unresolved conversations