-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
2 Releases published by 1 person
-
v4.52.4 Patch release: v4.52.4
published
May 30, 2025 -
v4.52.4-ColQwen2-preview ColQwen2 (based on v4.52.4)
published
Jun 2, 2025
61 Pull requests merged by 39 people
-
Updated deprecated typing imports with equivalents for Python 3.9+
#38546 merged
Jun 4, 2025 -
New gpt neo model card
#38505 merged
Jun 4, 2025 -
tests/roformer: fix couple roformer tests on gpus
#38570 merged
Jun 4, 2025 -
[Dinov2] Enable device_map="auto" support
#38487 merged
Jun 4, 2025 -
Add Expectations for three AMD tests
#38581 merged
Jun 4, 2025 -
Pin Scipy version to >=1.12.0
#38469 merged
Jun 4, 2025 -
Janus seamless transfer
#38580 merged
Jun 4, 2025 -
feat: add
repository
field to benchmarks table#38582 merged
Jun 4, 2025 -
Docs: fix code formatting in torchao docs
#38504 merged
Jun 4, 2025 -
allow custom head_dim for qwen2_moe
#37188 merged
Jun 4, 2025 -
fix(attention_visualizer): add default value for image_seq_length
#38577 merged
Jun 4, 2025 -
[
FlexAttn
] Fix models with unique characteristics#38433 merged
Jun 4, 2025 -
Expectation changes and more AMD expectations
#38529 merged
Jun 4, 2025 -
Fix
deepseekv3
#38562 merged
Jun 4, 2025 -
update
utils/notification_service.py
for AMD vs Nvidia#38563 merged
Jun 4, 2025 -
Fix
chameleon
tests#38565 merged
Jun 4, 2025 -
Add support for MiniMax's MiniMax-Text-01
#35831 merged
Jun 4, 2025 -
Relaxed the output_attention condition for ValueError
#38560 merged
Jun 4, 2025 -
[janus] Fix failing tests on mi3XX
#38426 merged
Jun 4, 2025 -
Added guards against device mismatch errors in some LogitsProcessor classes
#38558 merged
Jun 4, 2025 -
Fixed a multiple-devices issue in SmolVLMModel
#38557 merged
Jun 4, 2025 -
[docs] Format fix
#38414 merged
Jun 3, 2025 -
Fix hqq issue
#38551 merged
Jun 3, 2025 -
Name change AOPermod -> ModuleFqn
#38456 merged
Jun 3, 2025 -
Fix
utils/notification_service.py
#38556 merged
Jun 3, 2025 -
Explicitly setting encoding in tokenization_utils_base.py
#38553 merged
Jun 3, 2025 -
[TP] Change command in tests to
python3
#38555 merged
Jun 3, 2025 -
[bugfix] fix apply_rotary_emb error on Ascend NPU
#38491 merged
Jun 3, 2025 -
Update docker image to use
av==10.0.0
#38548 merged
Jun 3, 2025 -
update emu3 test
#38543 merged
Jun 3, 2025 -
Don't use default attn if pre-set in sub-config
#38526 merged
Jun 3, 2025 -
[tests] expand flex-attn test for vision models
#38434 merged
Jun 3, 2025 -
Fix blip2 tests
#38510 merged
Jun 2, 2025 -
Fix
Gemma2IntegrationTest
#38492 merged
Jun 2, 2025 -
Remove type annotation in Siglip Attention Module
#38503 merged
Jun 2, 2025 -
Num parameters in model.safetensors.index.json
#38531 merged
Jun 2, 2025 -
[flax/mistral] support sliding_window: null in config
#37402 merged
Jun 2, 2025 -
Fix amp deprecation issue
#38100 merged
Jun 2, 2025 -
remove unhandled parameter
#38145 merged
Jun 2, 2025 -
Add ColQwen2 to 🤗 transformers
#35778 merged
Jun 2, 2025 -
[generate] move
SinkCache
to acustom_generate
repo#38399 merged
Jun 2, 2025 -
[generate] add soft deprecations on custom generation methods
#38406 merged
Jun 2, 2025 -
Update Loss Functions to Accept Tensor num_items_in_batch
#38029 merged
Jun 2, 2025 -
[seamless_m4t] Skip some tests when speech is not available
#38430 merged
Jun 2, 2025 -
Fix setting FLASH_ATTENTION_DETERMINISTIC after importing
#37185 merged
Jun 2, 2025 -
Remove deprecated use_flash_attention_2 parameter
#37131 merged
Jun 2, 2025 -
[docs] add xpu environment variable for gpu selection
#38194 merged
May 30, 2025 -
protect dtensor import
#38496 merged
May 30, 2025 -
Align TP check
#38328 merged
May 30, 2025 -
[Tests] Reduced model size for albert-test model
#38480 merged
May 30, 2025 -
Bump torch from 2.2.0 to 2.6.0 in /examples/flax/vision
#37618 merged
May 30, 2025 -
Fix incorrect bbox_embed initialization when decoder_bbox_embed_share=False in GroundingDINO
#38238 merged
May 30, 2025 -
Fix convert_internvl_weights_to_hf.py to support local paths
#38264 merged
May 30, 2025 -
Make patch helper more helpful
#38409 merged
May 30, 2025 -
fix: handle no scheduler passed by user
#38407 merged
May 30, 2025 -
[Qwen2.5-Omni] Fix dtype of cos,sin when used with flash attention
#38453 merged
May 29, 2025 -
Fix
Gemma3IntegrationTest
#38471 merged
May 29, 2025 -
Cleanup
BatchFeature
andBatchEncoding
#38459 merged
May 29, 2025 -
Fix TypeError in save_pretrained error handling (fixes #38422)
#38449 merged
May 29, 2025 -
🔴 [VLM] modeling updates
#38317 merged
May 29, 2025 -
[Tests] Clean up test cases for few models
#38315 merged
May 29, 2025
53 Pull requests opened by 40 people
-
Add glpn fast processor
#38461 opened
May 29, 2025 -
fix torch_dtype on awq
#38463 opened
May 29, 2025 -
Fix trainer.py not showing signature columns
#38465 opened
May 29, 2025 -
Fix HQQ model param device transfer issue
#38466 opened
May 29, 2025 -
[VLMs] support passing embeds along with pixels
#38467 opened
May 29, 2025 -
Add detailed ConvBERT model card with usage, architecture, and refere…
#38470 opened
May 29, 2025 -
Updated Aria model card
#38472 opened
May 29, 2025 -
docs: Add Turkish translation for README
#38473 opened
May 29, 2025 -
Avoid overwrite existing local implementation when loading remote custom model
#38474 opened
May 29, 2025 -
Refactor DBRX tests to use CausalLMModelTest base classes
#38475 opened
May 29, 2025 -
Fix meta tensor copy error
#38478 opened
May 29, 2025 -
[static cache] fix device map per layer in VLMs
#38488 opened
May 30, 2025 -
lazy cache init
#38495 opened
May 30, 2025 -
Add ZoeDepthImageProcessorFast: PyTorch-native Fast Image Preprocessing for ZoeDepth
#38497 opened
May 30, 2025 -
Add fast imageprocessor vitpose
#38502 opened
May 31, 2025 -
Fixed markdown for BertTokenizer's '[CLS]' token.
#38506 opened
May 31, 2025 -
Fix initialization of a pretrained backbone
#38512 opened
Jun 1, 2025 -
Update blip model card
#38513 opened
Jun 1, 2025 -
added fast image processor for ZoeDepth and expanded tests accordingly
#38515 opened
Jun 1, 2025 -
Fix `return_dict=False` giving errors in a few VLM models
#38519 opened
Jun 1, 2025 -
Add QuasarV4 model
#38520 opened
Jun 1, 2025 -
[qwen-omni] fix sliding window
#38525 opened
Jun 2, 2025 -
Logging message for ``` is_bitsandbytes_available() ```
#38528 opened
Jun 2, 2025 -
Fix to make vllm happy
#38530 opened
Jun 2, 2025 -
On branch fix-void-segment-mask-input [WIP]
#38532 opened
Jun 2, 2025 -
another way to use shift_labels
#38533 opened
Jun 2, 2025 -
fixed a bug which was causing only partial files to be imported
#38534 opened
Jun 2, 2025 -
Update data collator to support sequence_length
#38536 opened
Jun 2, 2025 -
Allow `mlm_probability` to be set to `None` when `mlm=False` in DataCollatorForLanguageModeling (#38522)
#38537 opened
Jun 2, 2025 -
[docs] transformers-cli command
#38539 opened
Jun 2, 2025 -
Image processor compile fix
#38540 opened
Jun 3, 2025 -
[WIP chat template] return assistant mask in processors
#38545 opened
Jun 3, 2025 -
Fix CTRL model DataParallel compatibility
#38547 opened
Jun 3, 2025 -
Improve GPTNeoX model card following standardization guidelines
#38550 opened
Jun 3, 2025 -
Better CI
#38552 opened
Jun 3, 2025 -
[masking utils] check `None` instead of try/except
#38561 opened
Jun 3, 2025 -
Fix ModernBERT tokenizer issue with is_split_into_words flag
#38564 opened
Jun 3, 2025 -
Fix `FalconMambaIntegrationTests`
#38566 opened
Jun 3, 2025 -
Update Wav2Vec2 documentation to create model cards
#38568 opened
Jun 3, 2025 -
Add Bagel
#38569 opened
Jun 3, 2025 -
enable more test cases on xpu
#38572 opened
Jun 4, 2025 -
Fix `MiniMax` (docs and integration tests checkpoint)
#38575 opened
Jun 4, 2025 -
Disable custom MRA kernels for ROCm
#38578 opened
Jun 4, 2025 -
blt wip
#38579 opened
Jun 4, 2025 -
update `ColQwen2ModelIntegrationTest`
#38583 opened
Jun 4, 2025 -
Fix zero rotary dim
#38584 opened
Jun 4, 2025 -
[don't merge yet] Fix RAG
#38585 opened
Jun 4, 2025 -
docs: fix dark mode logo display.
#38586 opened
Jun 4, 2025 -
Refactor Bamba tests to inherit from CausalLMModelTester base classes
#38587 opened
Jun 4, 2025 -
Remove custom pytest and pluggy
#38589 opened
Jun 4, 2025 -
Fix: Correctly handle integer device_map for NPU devices in _load_sta…
#38591 opened
Jun 4, 2025 -
[docs] Add int4wo + 2:4 sparsity example to TorchAO README
#38592 opened
Jun 4, 2025 -
Remove IPEX requirement for bitsandbytes on CPU
#38594 opened
Jun 4, 2025
50 Issues closed by 22 people
-
Qwen2.5-VL using ascend NPU with flash-attention-2 raises error
#38189 closed
Jun 4, 2025 -
Community contribution: enabling `device_map="auto"` support for more vision and multimodal models
#29786 closed
Jun 4, 2025 -
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 closed
Jun 4, 2025 -
allow custom head_dim for qwen2_moe
#37187 closed
Jun 4, 2025 -
Support for excel files
#38567 closed
Jun 4, 2025 -
404 Client Error when accessing https://router.huggingface.co/nebius/v1/chat/completions endpoint
#38524 closed
Jun 4, 2025 -
TapasTokenizer Produces All Zero token_type_ids Even with Tutorial Data
#37183 closed
Jun 4, 2025 -
Whisper chunking algorithm increases WER
#37789 closed
Jun 4, 2025 -
AutomaticMaskGeneration does not work with batch_size greater than 1
#37805 closed
Jun 4, 2025 -
ValueError: size must contain 'shortest_edge' and 'longest_edge' keys.
#37811 closed
Jun 4, 2025 -
Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI
#35710 closed
Jun 4, 2025 -
1
#38574 closed
Jun 4, 2025 -
register_quantizer or register_quantization_config does not add new method to QuantizationMethod
#38462 closed
Jun 4, 2025 -
Hang in quantized_phi::ModelWeights::forward() with Phi-2 GGUF on CPU (Candle main branch)
#38516 closed
Jun 3, 2025 -
Why do you remove sample_indices_fn for processor.apply_chat_template?
#38527 closed
Jun 3, 2025 -
torch.compile fails for gemma-3-1b-it
#38501 closed
Jun 2, 2025 -
num_items_in_batch should be moved to logits.device in ForCausalLMLoss too
#37886 closed
Jun 2, 2025 -
ImportError: cannot import name 'amp' from 'apex'
#38095 closed
Jun 2, 2025 -
Release Tag Changed, Breaking Checksums, and AUR Package Building
#37090 closed
Jun 2, 2025 -
Loading and Saving Pretrained model to the same directory raises SafeTensorError: IOError
#37713 closed
Jun 2, 2025 -
Failed to load model with transformers 4.51.3 when WORLD_SIZE set to 1 on nvidia gpu
#37737 closed
Jun 2, 2025 -
[Trainer] As gradient_accumulation_steps increases, the loss also increases
#37766 closed
Jun 2, 2025 -
error: subprocess-exited-with-error when install transformerspython
#37775 closed
Jun 2, 2025 -
DTensor Import Path Changes in PyTorch 2.5 Causing Compatibility Issues
#38251 closed
Jun 2, 2025 -
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 closed
Jun 1, 2025 -
Can't perform inference with images on Gemma-3-12b-it-qat-int4.0
#37710 closed
Jun 1, 2025 -
Very slow model instantiation
#37712 closed
Jun 1, 2025 -
Model Request: SLaM (Sparse Latent Mixer) – Multimodal Flamingo Alternative
#38508 closed
May 31, 2025 -
bug in new prefill_chunk_size implementation
#38028 closed
May 31, 2025 -
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 closed
May 31, 2025 -
Object detection tutorial uses buggy dataset, may lead to crash during training
#36516 closed
May 31, 2025 -
Tokenizing with `apply_chat_template` behaves differently from regular tokenizing
#37686 closed
May 31, 2025 -
num_items_in_batch larger than the actual useful token when computing loss
#38448 closed
May 31, 2025 -
ImportError: cannot import name 'DTensor' from 'torch.distributed.tensor'
#38494 closed
May 30, 2025 -
[Tests] Testing for ALBERT is quite slow
#38344 closed
May 30, 2025 -
Unexpected Zero Probabilities with siglip2-base-patch16-224 Model
#38175 closed
May 30, 2025 -
VLM reverse mapping logic in modeling_utils.py save_pretrained not doing anything?
#38489 closed
May 30, 2025 -
TypeError in Llama-4-Maverick-17B-128E-Instruct-FP8 Resolved with Workaround
#38283 closed
May 30, 2025 -
A shallow copy in groundingdino
#37333 closed
May 30, 2025 -
convert_internvl_weights_to_hf.py does not support model in a local path
#38200 closed
May 30, 2025 -
[Bug - Qwen2.5-Omni] FlashAttention 2 BF16 dtype mismatch persists in `apply_rotary_pos_emb_flashatt`
#38451 closed
May 29, 2025 -
Bug in error handling routine in save_pretrained
#38422 closed
May 29, 2025 -
add Flash Attention Support for Helsinki-NLP/opus models
#36169 closed
May 29, 2025 -
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 closed
May 29, 2025 -
bitnet
#37632 closed
May 29, 2025 -
Error message is misleading for missing protobuf
#37641 closed
May 29, 2025 -
phi-4-mm HF format
#38120 closed
May 29, 2025 -
_register_pytree_node error in torch2.1.0 and bf16 assertion error for XPU and NPU
#37838 closed
May 29, 2025
32 Issues opened by 32 people
-
Consider Deprecating Sigopt from Hyperparameter Search
#38593 opened
Jun 4, 2025 -
"facebook/opt-125m" gives wrong results
#38590 opened
Jun 4, 2025 -
Transformers fail to load deepseek-ai/DeepSeek-V3 with vllm
#38588 opened
Jun 4, 2025 -
Request to add the small-doge model
#38573 opened
Jun 4, 2025 -
Oneke not utilizing much from GPU(Nvidia L20)
#38571 opened
Jun 4, 2025 -
Possible Typo in "Mask2FormerLoss"
#38559 opened
Jun 3, 2025 -
hidden_states, self_attn_weights = self.self_attn( ValueError: too many values to unpack (expected 2)
#38554 opened
Jun 3, 2025 -
Clarification on default top_k sampling parameter
#38549 opened
Jun 3, 2025 -
Paligemma model card needs update
#38544 opened
Jun 3, 2025 -
enable GraniteMoeHybridIntegrationTest in UT
#38542 opened
Jun 3, 2025 -
`eager_attention_forward` and `repeat_kv` code duplication
#38541 opened
Jun 3, 2025 -
Hidden states are different for model() and model.generate()
#38538 opened
Jun 2, 2025 -
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 opened
Jun 2, 2025 -
"Size mismatch" error when trying to download pretrained ChatGPT-4 using transformers
#38523 opened
Jun 2, 2025 -
Allow `mlm_probability` to be set to None when `mlm`=False in `DataCollatorForLanguageModeling`
#38522 opened
Jun 2, 2025 -
Error for `return_assistant_tokens_mask` in MLLM processor
#38521 opened
Jun 2, 2025 -
Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429
#38518 opened
Jun 1, 2025 -
model_type = self._reverse_config_mapping[key.__name__] KeyError: 'Qwen2RMConfig'
#38517 opened
Jun 1, 2025 -
Can not reproduce Blip2ForImageTextRetrieval example from docs, getting different results
#38514 opened
Jun 1, 2025 -
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
#38509 opened
May 31, 2025 -
id2label assignment problem in run_glue.py
#38507 opened
May 31, 2025 -
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#38500 opened
May 30, 2025 -
ModernBERT for MLM outputs incorrect hidden state shape.
#38499 opened
May 30, 2025 -
[Florence-2] SyntaxWarning: invalid escape sequence '\d' in processing_florence2.py
#38498 opened
May 30, 2025 -
Clarification on per_device_train_batch_size in Trainer
#38484 opened
May 30, 2025 -
Transformers 4.41.0 does not recognize 'gemma2' model type for google/gemma-2-2b
#38482 opened
May 29, 2025 -
Token shape issue in LLaVA-onevision fine-tuning
#38481 opened
May 29, 2025 -
ImportError: DLL load failed while importing _safetensors_rust: The specified module could not be found
#38479 opened
May 29, 2025 -
Pickle error when downloading DeepSeek model
#38476 opened
May 29, 2025 -
AssertionError: Torch not compiled with CUDA enabled when using device_map="auto" in Ascend NPU
#38468 opened
May 29, 2025 -
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 opened
May 29, 2025 -
Incorrect API call
#38457 opened
May 28, 2025
116 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add support for Florence-2
#38188 commented on
Jun 4, 2025 • 73 new comments -
GLM-4-0414 Change
#38431 commented on
Jun 4, 2025 • 39 new comments -
Split `transformers chat` and `transformers serve`
#38443 commented on
Jun 3, 2025 • 31 new comments -
Add EoMT Model
#37610 commented on
Jun 4, 2025 • 27 new comments -
support MiniCPM-o2.6
#37917 commented on
Jun 3, 2025 • 14 new comments -
Add LightGlue model
#31718 commented on
Jun 4, 2025 • 11 new comments -
Add Fast Image Processor for mobileViT
#37143 commented on
Jun 2, 2025 • 7 new comments -
Add X-Codec model
#38248 commented on
May 30, 2025 • 7 new comments -
Encoder-Decoder Gemma
#38332 commented on
May 30, 2025 • 6 new comments -
fix total batch size calculation in trainer
#38286 commented on
Jun 4, 2025 • 5 new comments -
support overlapping masks in mask2former image processor
#37357 commented on
Jun 3, 2025 • 3 new comments -
Skip non-selected experts for qwen3_moe
#38133 commented on
Jun 2, 2025 • 3 new comments -
[WIP] Perception lm
#37878 commented on
May 31, 2025 • 2 new comments -
Add Ovis2 model and processor implementation
#37088 commented on
Jun 4, 2025 • 2 new comments -
Add kernelize to transformers
#38205 commented on
Jun 3, 2025 • 2 new comments -
Enable tracing for Moshi
#36894 commented on
Jun 2, 2025 • 2 new comments -
Fix Whisper inference regression with backward-compatible logprob calculation
#38388 commented on
Jun 3, 2025 • 2 new comments -
Add Dia model
#38405 commented on
Jun 4, 2025 • 2 new comments -
Refactor `MambaCache` to `modeling_mamba.py` (parity with Zamba)
#38086 commented on
Jun 2, 2025 • 1 new comment -
feat: support indivisible shards for TP model loading and TPlizing.
#37220 commented on
Jun 2, 2025 • 1 new comment -
[trainer] ensure special tokens in model configs are aligned with tokenizer at train time
#38441 commented on
Jun 3, 2025 • 1 new comment -
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on
May 30, 2025 • 1 new comment -
Update tokenization_utils_base.py
#37512 commented on
Jun 2, 2025 • 0 new comments -
internalize build_inputs_with_special_tokens and prepare_for_model
#37522 commented on
Jun 2, 2025 • 0 new comments -
Fix interpolation of convnext image processor
#37460 commented on
Jun 4, 2025 • 0 new comments -
Fast tokenizer encoding doesn't handle empty string input
#37537 commented on
Jun 2, 2025 • 0 new comments -
[Cache] Support compilable cache reuse with smaller batch sizes
#37394 commented on
Jun 2, 2025 • 0 new comments -
Add configurable normalization schemes to SigLIP image processors
#38444 commented on
May 29, 2025 • 0 new comments -
handle training summary when creating modelcard but offline mode is set
#37095 commented on
Jun 2, 2025 • 0 new comments -
[draft] random tests order
#37082 commented on
Jun 2, 2025 • 0 new comments -
Adding a stub for MiniCPM-o to the models
#37049 commented on
Jun 3, 2025 • 0 new comments -
Add Fast Segformer Processor
#37024 commented on
Jun 4, 2025 • 0 new comments -
Add Fast SamImageProcessor
#36999 commented on
Jun 1, 2025 • 0 new comments -
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 commented on
Jun 2, 2025 • 0 new comments -
Support loading custom code objects (`trust_remote_code=True`) in offline mode from local
#36808 commented on
Jun 4, 2025 • 0 new comments -
Add Aimv2 model
#36625 commented on
May 29, 2025 • 0 new comments -
Fix edge case for tokenize (#36277)
#36555 commented on
May 29, 2025 • 0 new comments -
[Qwen2.5-VL] Fix empty string input crash in processor
#38421 commented on
May 29, 2025 • 0 new comments -
Lag kv cache
#38364 commented on
Jun 3, 2025 • 0 new comments -
align xpu's autocast behavior w/ cuda by using device agnostic torch APIs
#38284 commented on
Jun 4, 2025 • 0 new comments -
Add zero dim tensor check when using flash_attention
#38280 commented on
May 30, 2025 • 0 new comments -
[docs] Tensor parallelism
#38241 commented on
Jun 2, 2025 • 0 new comments -
Add SVE implementation for Mamba Sequential Scan Algorithm
#38185 commented on
Jun 3, 2025 • 0 new comments -
[WIP] new BLT
#38173 commented on
May 30, 2025 • 0 new comments -
Fix FSDP + llava-next/llava-onevision
#38141 commented on
Jun 4, 2025 • 0 new comments -
Cache System Refactor: Layered Architecture
#38077 commented on
Jun 4, 2025 • 0 new comments -
update loss computation in modeling code
#37993 commented on
Jun 2, 2025 • 0 new comments -
Add dia
#37941 commented on
Jun 3, 2025 • 0 new comments -
[WIP] Add MM Grounding DINO
#37925 commented on
May 30, 2025 • 0 new comments -
Add DEIM object detection model
#37875 commented on
Jun 1, 2025 • 0 new comments -
[WiP] Add xcodec2 model
#37868 commented on
Jun 4, 2025 • 0 new comments -
Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM`
#37830 commented on
May 30, 2025 • 0 new comments -
qwen null pointer check.
#37810 commented on
Jun 2, 2025 • 0 new comments -
Update ruff to 0.11.7 and some fixes
#37809 commented on
Jun 4, 2025 • 0 new comments -
fix qwen2.5-omini cant be loaded from AutoModel
#37795 commented on
Jun 2, 2025 • 0 new comments -
Adding features like Tokenizer evaluation/benchmarking
#37792 commented on
Jun 2, 2025 • 0 new comments -
Updated Albert model Card
#37753 commented on
Jun 4, 2025 • 0 new comments -
refactor create_token_type_ids_from_sequences
#37681 commented on
Jun 2, 2025 • 0 new comments -
Non model inits
#37653 commented on
Jun 2, 2025 • 0 new comments -
Add DeepSeek V2 Model into Transformers
#36400 commented on
Jun 3, 2025 • 0 new comments -
💡 Proposal: Add temporal-grounding pipeline for video-language tasks
#38450 commented on
Jun 1, 2025 • 0 new comments -
Convnext image preprocessor raises an AssertionError when comparing logits
#37461 commented on
Jun 1, 2025 • 0 new comments -
Weights not initialized correctly when instantiating model with a pretrained backbone
#38061 commented on
Jun 1, 2025 • 0 new comments -
Please support GGUF format for UMT5EncoderModel
#36774 commented on
May 31, 2025 • 0 new comments -
Any plans on adding Flash Attention 3?
#33373 commented on
May 31, 2025 • 0 new comments -
401 Unauthorized Error: "Invalid credentials" on POST requests to Inference API from multiple services
#38289 commented on
May 31, 2025 • 0 new comments -
[BUG] Batch inference DDP + zero stage 3 = inference code hangs
#36638 commented on
May 31, 2025 • 0 new comments -
ModernBert Tokenizer flag `is_split_into_words` not working
#37883 commented on
May 31, 2025 • 0 new comments -
Error in input expansion for `generate` with `num_return_sequences` > 1 for multi-image inputs to `AutoModelForImageTextToText`
#37900 commented on
May 31, 2025 • 0 new comments -
Object detection training/fine-tuning for Owl-vit/Owlv2
#33664 commented on
May 31, 2025 • 0 new comments -
OWL-ViT training / fine-tuning code
#20091 commented on
May 31, 2025 • 0 new comments -
Gibberish generations with FSDP2 and MixedPrecisionPolicy
#38190 commented on
May 30, 2025 • 0 new comments -
A type error in the Template writing document
#37524 commented on
May 30, 2025 • 0 new comments -
ImageInput doesn't include JAX ndarray and TensorFlow tensor
#37857 commented on
May 30, 2025 • 0 new comments -
BUG: ModernBERT flash-attention2 incompatible on Ascend NPU
#37859 commented on
May 30, 2025 • 0 new comments -
Llama2 can output scores normally, but Llama3 outputs full inf
#37862 commented on
May 30, 2025 • 0 new comments -
WhisperForCTC
#26242 commented on
May 30, 2025 • 0 new comments -
Potential mix-up with IMAGENET_STANDARD and IMAGENET_DEFAULT values
#38318 commented on
May 30, 2025 • 0 new comments -
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 commented on
May 29, 2025 • 0 new comments -
Memory saving by upcasting logits for only non-ignored positions
#38452 commented on
May 29, 2025 • 0 new comments -
accelerate + device_map auto = error
#38408 commented on
May 29, 2025 • 0 new comments -
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 commented on
May 29, 2025 • 0 new comments -
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on
May 29, 2025 • 0 new comments -
Gemma3: Cuda error: misaligned address
#36961 commented on
May 29, 2025 • 0 new comments -
Decoder Attention Mask is not passed to the VisionEncoderDecoderModel during training!!
#37823 commented on
May 29, 2025 • 0 new comments -
AttentionMaskVisualizer hard-code sliding_window to 5 in transformers code.
#37851 commented on
May 29, 2025 • 0 new comments -
Will Trainer.predict() return data in the same order as the original dataset during multi-machine and multi-gpus inference?
#33728 commented on
May 29, 2025 • 0 new comments -
Add support for BAGEL from ByteDance
#38267 commented on
May 29, 2025 • 0 new comments -
Add evolla rebase main
#36232 commented on
Jun 3, 2025 • 0 new comments -
Add Doge model
#35891 commented on
Jun 3, 2025 • 0 new comments -
Integrate xlstm cleanly.
#35377 commented on
Jun 4, 2025 • 0 new comments -
Correctly support resuming from checkpoint with a dataset without length
#33544 commented on
Jun 3, 2025 • 0 new comments -
Add Segment Anything 2 (SAM2)
#32317 commented on
Jun 4, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
Jun 4, 2025 • 0 new comments -
Beit image classification have different results compared from versions prior to 4.43.0
#34446 commented on
Jun 4, 2025 • 0 new comments -
Processor multiprocessing error when load custom processor
#37637 commented on
Jun 4, 2025 • 0 new comments -
MedGemma worked find prior to 4.52.3 release but now errors
#38333 commented on
Jun 4, 2025 • 0 new comments -
LagKV for key-value compression
#38312 commented on
Jun 4, 2025 • 0 new comments -
`ConditionalDetrImageProcessor` still accepts the deprecated parameter `max_size`
#37939 commented on
Jun 4, 2025 • 0 new comments -
Errors using TinyLlama-1.1B-Chat-v1.0 and DirectML
#38340 commented on
Jun 4, 2025 • 0 new comments -
Add RoMa keypoint matcher
#36718 commented on
Jun 3, 2025 • 0 new comments -
Maybe the vocab_size can be duplicated to the mainconfig for PEFT to pick up
#38017 commented on
Jun 3, 2025 • 0 new comments -
Shape Error in Llama4VisionMLP2
#37321 commented on
Jun 3, 2025 • 0 new comments -
[Bug] Gemma3Processor.apply_chat_template returns Tensor instead of dict with long multimodal few-shot inputs
#37943 commented on
Jun 3, 2025 • 0 new comments -
Alternative to trainer.hyperparameter_search for models used with custom optimizer / lrscheduler etc.
#37945 commented on
Jun 3, 2025 • 0 new comments -
Add examples that showcase the use of Hyperparameter search with Transformers
#37947 commented on
Jun 3, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
Jun 2, 2025 • 0 new comments -
Model implmenetation using Liger Kernel layers
#38416 commented on
Jun 2, 2025 • 0 new comments -
quantizer_hqq should not require a gpu/cuda device to run
#38439 commented on
Jun 2, 2025 • 0 new comments -
Add Gemma 3 For Sequence Classification
#36755 commented on
Jun 2, 2025 • 0 new comments -
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on
Jun 2, 2025 • 0 new comments -
request the support for training support for QuantizationMethod.FP8
#37927 commented on
Jun 2, 2025 • 0 new comments -
Updates in type-checking specifications have broken transformers' types
#37928 commented on
Jun 2, 2025 • 0 new comments -
Is Llama4TextL2Norm meant to be RMS norm?
#37934 commented on
Jun 2, 2025 • 0 new comments -
[i18n-TR] Translating docs to Turkish
#27088 commented on
Jun 1, 2025 • 0 new comments -
transformers showing decoder model architecture detected so padding should be left
#38071 commented on
Jun 1, 2025 • 0 new comments