-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
5 Releases published by 2 people
-
v4.49.0-AyaVision Aya Vision (Based on v4.49.0)
published
Mar 4, 2025 -
v4.49.0-Gemma-3 Gemma 3 (Based on v4.49.0)
published
Mar 18, 2025 -
v4.49.0-Mistral-3 Mistral 3 (Based on v4.49.0)
published
Mar 18, 2025 -
v4.50.0 Release v4.50.0
published
Mar 21, 2025 -
v4.50.1 Patch release v4.50.1
published
Mar 25, 2025
244 Pull requests merged by 88 people
-
[docs] Attention mask image
#36970 merged
Mar 26, 2025 -
Remove deprecated training arguments
#36946 merged
Mar 26, 2025 -
fix typos in the code comments and error messages
#36993 merged
Mar 26, 2025 -
Log the correct learning rate
#36973 merged
Mar 26, 2025 -
Fix device_map check for ggml files
#37003 merged
Mar 26, 2025 -
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.
#36975 merged
Mar 26, 2025 -
Allow easy registration of custom attention functions
#36889 merged
Mar 26, 2025 -
Fix get_device_properties
#36997 merged
Mar 26, 2025 -
Fix Optional type annotation
#36841 merged
Mar 26, 2025 -
Install
networkx==3.2.1
manually in some CircleCI jobs after #36957#37000 merged
Mar 26, 2025 -
Use torch.expm1
#36995 merged
Mar 26, 2025 -
byebye CircleCI TF jobs
#36998 merged
Mar 26, 2025 -
Fix tensor dtype mismatch
#36985 merged
Mar 26, 2025 -
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default
#36307 merged
Mar 25, 2025 -
update bot comment again
#36974 merged
Mar 25, 2025 -
Add ruff target-version
#36971 merged
Mar 25, 2025 -
[docs] Fix image link
#36869 merged
Mar 25, 2025 -
Remove extra tensor clone in PyTorch code
#36748 merged
Mar 25, 2025 -
update
#36972 merged
Mar 25, 2025 -
Updated docker files to use
uv
for installing packages#36957 merged
Mar 25, 2025 -
typo fixed in README_fr.md
#36951 merged
Mar 25, 2025 -
Change GPUS to GPUs
#36945 merged
Mar 25, 2025 -
Update after #36962
#36965 merged
Mar 25, 2025 -
Update ruff to
0.11.2
#36962 merged
Mar 25, 2025 -
[Utils] torch version checks optionally accept dev versions
#36847 merged
Mar 25, 2025 -
Fix cuda index issue in cache allocator
#36937 merged
Mar 25, 2025 -
Support
return_tensors
in audio chat templates#34601 merged
Mar 25, 2025 -
fix typos in the tests directory
#36932 merged
Mar 25, 2025 -
Export for Phi4-mini
#36780 merged
Mar 25, 2025 -
Fixing _pre_quantization_dtype when torch_dtype is None
#36930 merged
Mar 25, 2025 -
Add Phi4 multimodal
#36939 merged
Mar 25, 2025 -
Deprecate #36741 and map Causal to Conditional
#36917 merged
Mar 25, 2025 -
Disallow Offload to disk for gguf files
#36933 merged
Mar 24, 2025 -
Fix processor kwargs qwen2 vl
#36890 merged
Mar 24, 2025 -
Added support for seed in
DataCollatorForWholeWordMask
#36903 merged
Mar 24, 2025 -
More precise comment
#36935 merged
Mar 24, 2025 -
Fix pytorch defomr attn path
#36923 merged
Mar 24, 2025 -
[2/N] Use pyupgrade --py39-plus to improve code
#36857 merged
Mar 24, 2025 -
Update
trainer_pt_utils.py
docstrings for consistency#36912 merged
Mar 24, 2025 -
Fix typos
#36910 merged
Mar 24, 2025 -
Use another repo. for Mistral3 processor testing
#36925 merged
Mar 24, 2025 -
Fix Compressed tensors to_dict_diff
#36922 merged
Mar 24, 2025 -
[chameleon] fix num image token check
#36918 merged
Mar 24, 2025 -
tests: fix asyncio.wait() usage for python>=3.11
#36898 merged
Mar 24, 2025 -
[Fix] Add
original_max_position_embeddings
to YARN rope_scaling optional keys#36877 merged
Mar 24, 2025 -
Fix torch version guard at import
#36907 merged
Mar 24, 2025 -
fix Gemma3 Config
#36893 merged
Mar 24, 2025 -
Update installation.md
#36826 merged
Mar 21, 2025 -
[docs] Model docs
#36469 merged
Mar 21, 2025 -
Fix Pan and Scan on batched images Gemma3
#36864 merged
Mar 21, 2025 -
Simplify keep_in_fp32_modules logic
#36722 merged
Mar 21, 2025 -
fix: loss computation after embeddings resize - mllama
#36840 merged
Mar 21, 2025 -
Fix: dtype cannot be str
#36262 merged
Mar 21, 2025 -
Minor Gemma 3 fixes
#36884 merged
Mar 21, 2025 -
Use
deformable_detr
kernel from the Hub#36853 merged
Mar 21, 2025 -
Gemma 3 tests expect greedy decoding
#36882 merged
Mar 21, 2025 -
🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing
#36859 merged
Mar 21, 2025 -
[generate] model defaults being inherited only happens for newer models
#36881 merged
Mar 21, 2025 -
Revert "Update deprecated Jax calls (#35919)"
#36880 merged
Mar 21, 2025 -
Make ViTPooler configurable
#36517 merged
Mar 21, 2025 -
chore: fix typos in the tests directory
#36813 merged
Mar 21, 2025 -
Remove call to
.item
inget_batch_samples
#36861 merged
Mar 21, 2025 -
FIX FSDP plugin update for QLoRA
#36720 merged
Mar 21, 2025 -
[CI] doc builder without custom image
#36862 merged
Mar 21, 2025 -
Mllama: raise better error
#35934 merged
Mar 21, 2025 -
Refactor Aya Vision with modular
#36688 merged
Mar 20, 2025 -
Add support for seed in
DataCollatorForLanguageModeling
#36497 merged
Mar 20, 2025 -
[CI] fix update metadata job
#36850 merged
Mar 20, 2025 -
Gemma3: fix test
#36820 merged
Mar 20, 2025 -
[torchao] revert to get_apply_tensor_subclass
#36849 merged
Mar 20, 2025 -
Add model visual debugger
#36798 merged
Mar 20, 2025 -
Add Prompt Depth Anything Model
#35401 merged
Mar 20, 2025 -
Refactor Attention implementation for ViT-based models
#36545 merged
Mar 20, 2025 -
DeepSpeed tensor parallel+ZeRO
#36825 merged
Mar 20, 2025 -
Support loading Quark quantized models in Transformers
#36372 merged
Mar 20, 2025 -
Use pyupgrade --py39-plus to improve code
#36843 merged
Mar 20, 2025 -
Fix hqq skipped modules and dynamic quant
#36821 merged
Mar 20, 2025 -
Fix ONNX export for sequence classification head
#36332 merged
Mar 20, 2025 -
Shieldgemma2
#36678 merged
Mar 20, 2025 -
Fix: remove the redundant snippet of _whole_word_mask
#36759 merged
Mar 20, 2025 -
Gemma 3: Adding explicit GenerationConfig and refactoring conversion …
#36833 merged
Mar 20, 2025 -
Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh"
#36768 merged
Mar 20, 2025 -
Update min safetensors bis
#36823 merged
Mar 20, 2025 -
[generate] clarify docstrings: when to inherit
GenerationMixin
#36605 merged
Mar 20, 2025 -
[modular] Sort modular skips
#36304 merged
Mar 20, 2025 -
Pass state dict
#35234 merged
Mar 20, 2025 -
[qwen2 audio] remove redundant code and update docs
#36282 merged
Mar 20, 2025 -
Update deprecated Jax calls
#35919 merged
Mar 20, 2025 -
Fix fp16 ONNX export for RT-DETR and RT-DETRv2
#36460 merged
Mar 20, 2025 -
Pass num_items_in_batch directly to loss computation
#36753 merged
Mar 20, 2025 -
Saving
Trainer.collator.tokenizer
in whenTrainer.processing_class
isNone
#36552 merged
Mar 20, 2025 -
fix tiktoken convert to pass AddedToken to Tokenizer
#36566 merged
Mar 20, 2025 -
[ForCausalLMLoss] allow users to pass shifted labels
#36607 merged
Mar 20, 2025 -
Disable inductor config setter by default
#36608 merged
Mar 20, 2025 -
Fix swanlab global step
#36728 merged
Mar 20, 2025 -
rewrite main method in Qwen2, making it more clear
#36772 merged
Mar 20, 2025 -
Move the warning to the documentation for DataCollatorWithFlattening
#36707 merged
Mar 20, 2025 -
Remove our AdamW implementation
#36177 merged
Mar 19, 2025 -
Update configuration_qwen2.py
#36735 merged
Mar 19, 2025 -
quick fix fast_image_processor register error
#36716 merged
Mar 19, 2025 -
Add Space to Bitsandbytes doc
#36834 merged
Mar 19, 2025 -
Support tracable dynamicKVcache
#36311 merged
Mar 19, 2025 -
One more fix for reviewer assignment
#36829 merged
Mar 19, 2025 -
[gemma 3] multimodal checkpoints + AutoModelForCausalLM
#36741 merged
Mar 19, 2025 -
enable OffloadedCache on XPU from PyTorch 2.7
#36654 merged
Mar 19, 2025 -
Add option for ao base configs
#36526 merged
Mar 19, 2025 -
Add attention visualization tool
#36630 merged
Mar 19, 2025 -
[Generation] remove leftover code from end-to-end compilation
#36685 merged
Mar 19, 2025 -
Fix Device map for bitsandbytes tests
#36800 merged
Mar 19, 2025 -
Remove
dist": "loadfile"
forpytest
for CircleCI jobs#36811 merged
Mar 19, 2025 -
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
#36572 merged
Mar 19, 2025 -
Expectations test utils
#36569 merged
Mar 18, 2025 -
[generate] ✨ vectorized beam search ✨
#35802 merged
Mar 18, 2025 -
Support custom dosctrings in modular
#36726 merged
Mar 18, 2025 -
Fix chameleon's TypeError because inputs_embeds may None
#36673 merged
Mar 18, 2025 -
Fix casting dtype for qunatization
#36799 merged
Mar 18, 2025 -
Fix Mistral3 tests
#36797 merged
Mar 18, 2025 -
Loading optimizations
#36742 merged
Mar 18, 2025 -
Update SHA for
tj-actions/changed-files
#36795 merged
Mar 18, 2025 -
fix hqq due to recent modeling changes
#36771 merged
Mar 18, 2025 -
Add Mistral3
#36790 merged
Mar 18, 2025 -
Fix gemma3_text tokenizer in mapping
#36793 merged
Mar 18, 2025 -
Fixing typo in gemma3 image_processor_fast and adding a small test
#36776 merged
Mar 18, 2025 -
chore: fix typos in tests directory
#36785 merged
Mar 18, 2025 -
fix typos in the tests directory
#36717 merged
Mar 17, 2025 -
doc: Clarify
is_decoder
usage in PretrainedConfig documentation#36724 merged
Mar 17, 2025 -
[docs] Update README
#36265 merged
Mar 17, 2025 -
[CI] remove redundant checks in
test_eager_matches_sdpa_inference
#36740 merged
Mar 17, 2025 -
[MINOR:TYPO] Update hubert.md
#36733 merged
Mar 17, 2025 -
Fix
TrainingArguments.torch_empty_cache_steps
post_init check#36734 merged
Mar 17, 2025 -
Fix test isolation for clear_import_cache utility
#36345 merged
Mar 17, 2025 -
fix xpu tests
#36656 merged
Mar 17, 2025 -
Allow ray datasets to be used with trainer
#36699 merged
Mar 17, 2025 -
fix can_generate
#36570 merged
Mar 17, 2025 -
enable/disable compile for quants methods
#36519 merged
Mar 17, 2025 -
🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings
#36422 merged
Mar 17, 2025 -
Fix grad accum arbitrary value
#36691 merged
Mar 14, 2025 -
Fix post_init() code duplication
#36727 merged
Mar 14, 2025 -
🌐 [i18n-KO] Translated codegen.md to Korean
#36698 merged
Mar 14, 2025 -
[tests] Parameterized
test_eager_matches_sdpa_inference
#36650 merged
Mar 14, 2025 -
Try working around the processor registration bugs
#36184 merged
Mar 14, 2025 -
Fix/best model checkpoint fix
#35885 merged
Mar 14, 2025 -
[model loading] don't
gc.collect()
if only 1 shard is used#36721 merged
Mar 14, 2025 -
Cleanup the regex used for doc preprocessing
#36648 merged
Mar 14, 2025 -
Make the flaky list a little more general
#36704 merged
Mar 14, 2025 -
Gemma3 processor typo
#36710 merged
Mar 14, 2025 -
Add support for fast image processors in add-new-model-like CLI
#36313 merged
Mar 13, 2025 -
Final CI cleanup
#36703 merged
Mar 13, 2025 -
Add GGUF support to T5-Encoder
#36700 merged
Mar 13, 2025 -
Handling an exception related to HQQ quantization in modeling
#36702 merged
Mar 13, 2025 -
fix: fsdp sharded state dict wont work for save_only_model knob
#36627 merged
Mar 13, 2025 -
Add loading speed test
#36671 merged
Mar 13, 2025 -
[CI] Automatic rerun of certain test failures
#36694 merged
Mar 13, 2025 -
chore: fix typos in utils module
#36668 merged
Mar 13, 2025 -
Fix dtype for params without tp_plan
#36681 merged
Mar 13, 2025 -
fix type annotation for ALL_ATTENTION_FUNCTIONS
#36690 merged
Mar 13, 2025 -
Change Qwen2_VL image processors to have init and call accept the same kwargs
#36207 merged
Mar 13, 2025 -
Upgrading torch version and cuda version in quantization docker
#36264 merged
Mar 13, 2025 -
fix wandb hp search unable to resume from sweep_id
#35883 merged
Mar 13, 2025 -
Changing the test model in Quanto kv cache
#36670 merged
Mar 13, 2025 -
Fix slicing for 0-dim param
#36580 merged
Mar 13, 2025 -
Update config.torch_dtype correctly
#36679 merged
Mar 13, 2025 -
[Cache] Don't initialize the cache on
meta
device#36543 merged
Mar 13, 2025 -
Fix rescale normalize inconsistencies in fast image processors
#36388 merged
Mar 13, 2025 -
Refactor siglip2 fast image processor
#36406 merged
Mar 13, 2025 -
Remove differences between init and preprocess kwargs for fast image processors
#36186 merged
Mar 12, 2025 -
[quants] refactor logic for modules_to_not_convert
#36672 merged
Mar 12, 2025 -
Remove hardcoded slow image processor class in processors supporting fast ones
#36266 merged
Mar 12, 2025 -
Fix Failing GPTQ tests
#36666 merged
Mar 12, 2025 -
Don't accidentally mutate the base_model_tp_plan
#36677 merged
Mar 12, 2025 -
[core] Large/full refactor of
from_pretrained
#36033 merged
Mar 12, 2025 -
Fix bnb regression due to empty state dict
#36663 merged
Mar 12, 2025 -
[CI] gemma 3
make fix-copies
#36664 merged
Mar 12, 2025 -
fix block mask typing
#36661 merged
Mar 12, 2025 -
HPU support
#36424 merged
Mar 12, 2025 -
Gemma3
#36658 merged
Mar 12, 2025 -
fix typos in the docs directory
#36639 merged
Mar 11, 2025 -
Fix gguf docs
#36601 merged
Mar 11, 2025 -
Remove research projects
#36645 merged
Mar 11, 2025 -
[docs] Update docs dependency
#36635 merged
Mar 11, 2025 -
Stop warnings from unnecessary torch.tensor() overuse
#36538 merged
Mar 11, 2025 -
Remove remote code warning
#36285 merged
Mar 11, 2025 -
Fix AriaForConditionalGeneration flex attn test
#36604 merged
Mar 11, 2025 -
Proper_flex
#36643 merged
Mar 11, 2025 -
Fix bugs in mllama image processing
#36156 merged
Mar 11, 2025 -
Refactor some core stuff
#36539 merged
Mar 11, 2025 -
[docs] Serving LLMs
#36522 merged
Mar 10, 2025 -
chore: fix typos in language models
#36586 merged
Mar 10, 2025 -
Fix auto-assign reviewers
#36631 merged
Mar 10, 2025 -
[
HybridCache
] disable automatic compilation#36620 merged
Mar 10, 2025 -
Fix check for XPU. PyTorch >= 2.6 no longer needs ipex.
#36593 merged
Mar 7, 2025 -
Fixed datatype related issues in
DataCollatorForLanguageModeling
#36457 merged
Mar 7, 2025 -
Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/decision_transformer
#36582 merged
Mar 7, 2025 -
Update "who to tag" / "who can review"
#36394 merged
Mar 7, 2025 -
Update chat_extras.md with content correction
#36599 merged
Mar 7, 2025 -
Github action for auto-assigning reviewers
#35846 merged
Mar 7, 2025 -
Export base streamer.
#36500 merged
Mar 7, 2025 -
avoid errors when the size of
input_ids
passed toPrefixConstrainedLogitsProcessor
is zero#36489 merged
Mar 7, 2025 -
Mention UltraScale Playbook 🌌 in docs
#36589 merged
Mar 6, 2025 -
fix: argument
#36558 merged
Mar 6, 2025 -
[XGLM] tag tests as slow
#36592 merged
Mar 6, 2025 -
[bark] fix loading of generation config
#36587 merged
Mar 6, 2025 -
Integrate SwanLab for offline/online experiment tracking and local visualization
#36433 merged
Mar 6, 2025 -
Modular Conversion --fix_and_overwrite on Windows
#36583 merged
Mar 6, 2025 -
Delete redundancy if case in model_utils
#36559 merged
Mar 6, 2025 -
Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/pplm
#36540 merged
Mar 6, 2025 -
chore: enhance message descriptions in parameters,comments,logs and docstrings
#36554 merged
Mar 6, 2025 -
Fix typos .
#36551 merged
Mar 6, 2025 -
Fix typos in tests
#36547 merged
Mar 5, 2025 -
guard torch version for uint16
#36520 merged
Mar 5, 2025 -
chore: enhance messages in docstrings
#36525 merged
Mar 4, 2025 -
Fix links in quantization doc
#36528 merged
Mar 4, 2025 -
Fix bamba tests amd
#36535 merged
Mar 4, 2025 -
chore: Fix typos in docs and examples
#36524 merged
Mar 4, 2025 -
Add aya
#36521 merged
Mar 4, 2025 -
[docs] Redesign
#31757 merged
Mar 3, 2025 -
Remove unused code
#36459 merged
Mar 3, 2025 -
[Style] fix E721 warnings
#36474 merged
Mar 3, 2025 -
Fix edge case for continue_final_message
#36404 merged
Mar 3, 2025 -
Fix pipeline+peft interaction
#36480 merged
Mar 3, 2025 -
chore: fix message descriptions in arguments and comments
#36504 merged
Mar 3, 2025 -
Fix some typos in docs
#36502 merged
Mar 3, 2025 -
fix torch_dtype, contiguous, and load_state_dict regression
#36512 merged
Mar 3, 2025 -
Fix kwargs UserWarning in SamImageProcessor
#36479 merged
Mar 3, 2025 -
Check
TRUST_REMOTE_CODE
forRealmRetriever
for security#36511 merged
Mar 3, 2025 -
Fix loading zero3 weights
#36455 merged
Mar 3, 2025 -
Fix _load_state_dict_into_meta_model with device_map=None
#36488 merged
Mar 2, 2025 -
Fix couples of issues from #36335
#36453 merged
Mar 1, 2025 -
Add Got-OCR 2 Fast image processor and refactor slow one
#36185 merged
Mar 1, 2025 -
[docs] fix bug in deepspeed config
#36081 merged
Feb 28, 2025 -
Fix loading models with mismatched sizes
#36463 merged
Feb 28, 2025 -
[GroundingDino] Fix grounding dino loss 🚨
#31828 merged
Feb 27, 2025 -
Fix
hub_retry
#36449 merged
Feb 27, 2025 -
Lazy import libraries in
src/transformers/image_utils.py
#36435 merged
Feb 27, 2025 -
[generate]
torch.distributed
-compatibleDynamicCache
#36373 merged
Feb 27, 2025 -
[save_pretrained ] Skip collecting duplicated weight
#36409 merged
Feb 27, 2025 -
Add
contents: write
#36445 merged
Feb 27, 2025 -
Fix another permission
#36444 merged
Feb 27, 2025 -
Fix permission
#36443 merged
Feb 27, 2025 -
Change PR to draft when it is (re)opened
#36417 merged
Feb 27, 2025 -
restrict cache allocator to non quantized model
#36428 merged
Feb 26, 2025 -
Fix Expected output for compressed-tensors tests
#36425 merged
Feb 26, 2025 -
Update form pretrained to make TP a first class citizen
#36335 merged
Feb 26, 2025
129 Pull requests opened by 93 people
-
Add fetch_paginated_github_data to deduplicate GitHub API pagination …
#36432 opened
Feb 26, 2025 -
Fix model saving bug post training with tensor parallel in Accelerate
#36434 opened
Feb 26, 2025 -
Add PlainDETR
#36437 opened
Feb 26, 2025 -
add FlashAttentionKwargs and seq_idx to flat collator
#36456 opened
Feb 27, 2025 -
Customize docstrings fast image processor
#36466 opened
Feb 27, 2025 -
Add NVIDIA Cosmos
#36476 opened
Feb 28, 2025 -
Fix incorrect attention mask truncate in WhisperFlashAttention2
#36477 opened
Feb 28, 2025 -
Sanitize Model Module Names to Follow Python Conventions
#36478 opened
Feb 28, 2025 -
Export T5 (encoder-decoder) to ExecuTorch
#36486 opened
Mar 1, 2025 -
Allow OOV Image Token for LLaVa Next Variants
#36491 opened
Mar 2, 2025 -
Create and Expose SamVisionModel as public for better accessibility
#36493 opened
Mar 2, 2025 -
Add an event related to forward in the TrainerCallback
#36496 opened
Mar 2, 2025 -
Refactor object-detection models
#36514 opened
Mar 3, 2025 -
[Validation] First implementation of `@strict_dataclass` from `huggingface_hub`
#36534 opened
Mar 4, 2025 -
add-long-vita
#36553 opened
Mar 5, 2025 -
Fix edge case for tokenize (#36277)
#36555 opened
Mar 5, 2025 -
add-LongVITAModel
#36556 opened
Mar 5, 2025 -
fix for loading gguf quantized model
#36563 opened
Mar 5, 2025 -
Allow saving and loading multiple "raw" chat template files
#36588 opened
Mar 6, 2025 -
Attention mechanisms elaboration
#36597 opened
Mar 6, 2025 -
[audio utils] fix fft_bin_width computation
#36603 opened
Mar 7, 2025 -
Add StableAdamW Optimizer
#36606 opened
Mar 7, 2025 -
Fixed 30s timestamp resets in Whisper long-form transcription
#36612 opened
Mar 7, 2025 -
Add Distill Any Depth
#36614 opened
Mar 8, 2025 -
[WIP] Add support to load models with transforms
#36621 opened
Mar 9, 2025 -
[WiP] Add Aimv2 model
#36625 opened
Mar 10, 2025 -
[Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!!
#36632 opened
Mar 10, 2025 -
Refine parameter type annotations
#36644 opened
Mar 11, 2025 -
Fix device issue in modeling_qwen2
#36647 opened
Mar 11, 2025 -
[i18n-KO] Translated `keypoint_detection.md` to Korean
#36649 opened
Mar 11, 2025 -
Fixes DynamicCache export issues due to control flow and inplace modifications
#36652 opened
Mar 11, 2025 -
Update quantizer_bnb_4bit.py
#36669 opened
Mar 12, 2025 -
don't pass NoneType for keep_in_fp32_modules
#36675 opened
Mar 12, 2025 -
Support batch size > 1 image-text inference
#36682 opened
Mar 12, 2025 -
prune LM Head for USD
#36695 opened
Mar 13, 2025 -
[Feature] Support using FlashAttention2 on Ascend NPU
#36696 opened
Mar 13, 2025 -
Limit numpy version to <2.0.0
#36706 opened
Mar 13, 2025 -
Fix long lagging when streaming text without spaces and NJK chars
#36708 opened
Mar 13, 2025 -
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 opened
Mar 14, 2025 -
fix whisper re-compile
#36712 opened
Mar 14, 2025 -
Add CSM model
#36719 opened
Mar 14, 2025 -
Fix generation using flash-attention and static cache
#36729 opened
Mar 14, 2025 -
Fix image processor speedup fixed
#36732 opened
Mar 14, 2025 -
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses
#36736 opened
Mar 15, 2025 -
[WP] PagedAttention + Prefix Cache for FlashAttention2
#36737 opened
Mar 15, 2025 -
🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean
#36750 opened
Mar 16, 2025 -
Add Qwen2.5-Omni
#36752 opened
Mar 16, 2025 -
🌐 [i18n-KO] Translated 'serving.md' to Korean
#36756 opened
Mar 17, 2025 -
🌐 [i18n-KO] Translated `gpu_selection.md` to Korean
#36757 opened
Mar 17, 2025 -
feat: expose the strict flag to allow catching missing model layers while loading a checkpoint
#36760 opened
Mar 17, 2025 -
🌐 [i18n-KO] Translated `electra.md` to Korean
#36763 opened
Mar 17, 2025 -
Add support for audios in apply_chat_template
#36770 opened
Mar 17, 2025 -
Use public export API on torch 2.5 and future
#36781 opened
Mar 18, 2025 -
Fix attention_mask dimension issue in GPT2Model
#36782 opened
Mar 18, 2025 -
Create modeling_ngen3.py for NGen3
#36787 opened
Mar 18, 2025 -
Update configuration_auto.py for NGen3
#36791 opened
Mar 18, 2025 -
Refactor `return_dict` logic to remove complicated if/else paths
#36794 opened
Mar 18, 2025 -
[don't merge] check tokenizer ci job
#36796 opened
Mar 18, 2025 -
Add Granite Speech Support
#36801 opened
Mar 18, 2025 -
Add long vita
#36807 opened
Mar 19, 2025 -
Support loading custom models (`trust_remote_code=True`) in offline mode from local
#36808 opened
Mar 19, 2025 -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 opened
Mar 19, 2025 -
Use `lru_cache` for tokenization tests
#36818 opened
Mar 19, 2025 -
Dummies
#36827 opened
Mar 19, 2025 -
[Modeling] Load FP8 safetensors such as DeepSeek
#36828 opened
Mar 19, 2025 -
gemma3 fp16 fix
#36832 opened
Mar 19, 2025 -
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 opened
Mar 19, 2025 -
Remove unnecessary attr assignment
#36837 opened
Mar 19, 2025 -
Move `return_dict` logic into `can_return_tuple` decorator
#36838 opened
Mar 19, 2025 -
Haocheng lu
#36839 opened
Mar 19, 2025 -
fix pegasus init weights and other copied models
#36844 opened
Mar 20, 2025 -
Add support for specifying revisions when pushing to Hub via internal Trainer call
#36852 opened
Mar 20, 2025 -
fix: prevent input side-effects in processor text args
#36866 opened
Mar 20, 2025 -
Only count num items in batch when needed
#36867 opened
Mar 20, 2025 -
Fix warning message for PEFT models in text-generation pipeline #36783
#36868 opened
Mar 20, 2025 -
Improve Model Download Speeds By ~3x For Large Models
#36870 opened
Mar 21, 2025 -
Adding Qwen3 and Qwen3MoE
#36878 opened
Mar 21, 2025 -
Fix `resume_from_checkpoint` not recognising `"last-checkpoint"`
#36883 opened
Mar 21, 2025 -
Optimize `to_py_obj` for python-native numeric lists and scalars
#36885 opened
Mar 21, 2025 -
Fix warning message for PEFT models in text-generation pipeline #36783
#36887 opened
Mar 21, 2025 -
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0)
#36891 opened
Mar 21, 2025 -
[WIP] Computer vision util: vision visualizer
#36892 opened
Mar 21, 2025 -
Enable tracing for Moshi
#36894 opened
Mar 21, 2025 -
Add RF-DETR
#36895 opened
Mar 21, 2025 -
Adding ArlowGPT
#36899 opened
Mar 22, 2025 -
Add NGen3
#36901 opened
Mar 22, 2025 -
LogfireCallback: Integrating Logfire with Hugging Face’s Trainer
#36905 opened
Mar 22, 2025 -
fix cached file error when repo type is dataset
#36909 opened
Mar 23, 2025 -
Limit number of evaluation samples processed during training
#36916 opened
Mar 24, 2025 -
[qwen2-audio] remove default template
#36919 opened
Mar 24, 2025 -
Allow disabling `deformable_detr` kernels
#36927 opened
Mar 24, 2025 -
Remove the redundant shift during the loss computation in the Moshi m…
#36928 opened
Mar 24, 2025 -
Aligning modling code for GPT2 to work with vLLM (fallback)
#36934 opened
Mar 24, 2025 -
[3/N] Use pyupgrade --py39-plus to improve code
#36936 opened
Mar 24, 2025 -
Static cache should support indexing
#36943 opened
Mar 24, 2025 -
Improve typing in TrainingArgument
#36944 opened
Mar 25, 2025 -
fix(qwen): fix shape error when using tp
#36947 opened
Mar 25, 2025 -
Update image_processing_qwen2_vl.py。fix bug.
#36948 opened
Mar 25, 2025 -
Avoid unnecessary tensor copy in loss computing
#36950 opened
Mar 25, 2025 -
Added Sapnous Architecture
#36952 opened
Mar 25, 2025 -
Skip code `307` in `RequestCounter`
#36953 opened
Mar 25, 2025 -
[chat templates} support loading audio from video
#36955 opened
Mar 25, 2025 -
fix: Fully remove legacy cache from Llama
#36958 opened
Mar 25, 2025 -
Remove low_cpu_mem_usage and _fast_init
#36963 opened
Mar 25, 2025 -
More ReDOS fixes!
#36964 opened
Mar 25, 2025 -
[phi-4] use mel filters from audio utils
#36966 opened
Mar 25, 2025 -
Add new dim to `num_items_in_batch` if necessary
#36967 opened
Mar 25, 2025 -
Make executorch integration more seamless by analyzing model signature
#36969 opened
Mar 25, 2025 -
Refactor image processor phi4
#36976 opened
Mar 25, 2025 -
Add device workaround for int4 weight only quantization after API update
#36980 opened
Mar 25, 2025 -
Refactor attention for SigLIP based models
#36981 opened
Mar 25, 2025 -
clean pipeline question_answering.
#36986 opened
Mar 26, 2025 -
fix comment misdirection during scaling loss
#36987 opened
Mar 26, 2025 -
fix transformers_cli import relative path issue
#36989 opened
Mar 26, 2025 -
Gaudi: Fix the pipeline failed issue with hpu device
#36990 opened
Mar 26, 2025 -
Set weights_only in torch.load
#36991 opened
Mar 26, 2025 -
fix and enhance pipeline_webserver.md
#36992 opened
Mar 26, 2025 -
remove redundant code in trainer
#36994 opened
Mar 26, 2025 -
[Phi4] add multimodal chat template
#36996 opened
Mar 26, 2025 -
Add Fast SamImageProcessor
#36999 opened
Mar 26, 2025 -
Replace default split function with jnp.split() in flax models
#37001 opened
Mar 26, 2025 -
Fix typing for None valued variables
#37004 opened
Mar 26, 2025 -
[Fast Processor] BEiT
#37005 opened
Mar 26, 2025 -
Remove deprecated batch_size argument
#37007 opened
Mar 26, 2025 -
Skip FP8 linear tests
#37008 opened
Mar 26, 2025 -
Export Whisper to ExecuTorch
#37009 opened
Mar 26, 2025 -
Fix AttentionInterface following feedback
#37010 opened
Mar 26, 2025 -
Add Fast Chinese-CLIP Processor
#37012 opened
Mar 26, 2025 -
[generate, cache] handle more complex device maps
#37014 opened
Mar 26, 2025
162 Issues closed by 47 people
-
Learning rate logging off by one training step
#35942 closed
Mar 26, 2025 -
ValueError: `run_compressed` is only supported for quantized_compressed models
#36915 closed
Mar 26, 2025 -
Recent update: configuration_eurobert.py not found -
#36983 closed
Mar 26, 2025 -
Issue with Progressive Generation Using inputs_embeds and past_key_values
#35707 closed
Mar 26, 2025 -
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 closed
Mar 26, 2025 -
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed
Mar 26, 2025 -
Token healing throws error with "Qwen/Qwen2.5-Coder-7B-Instruct"
#36210 closed
Mar 26, 2025 -
[bug] use_gather_object is not respected after the first eval in trainer
#36213 closed
Mar 26, 2025 -
Error: TypeError: argument 'ids': 'float' object cannot be interpreted as an integer
#36984 closed
Mar 26, 2025 -
Clarification on Commercial License Impact of LayoutLMv3ImageProcessor within UdopProcessor
#36931 closed
Mar 25, 2025 -
ImportError: cannot import name 'AdamW' from 'transformers'
#36954 closed
Mar 25, 2025 -
AutoTokenizer/Processor does not work with Mistral3 models
#36968 closed
Mar 25, 2025 -
Ruff update
#36705 closed
Mar 25, 2025 -
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 closed
Mar 25, 2025 -
`Mllama` not supported by `AutoModelForCausalLM` after updating `transformers` to `4.50.0`
#36926 closed
Mar 25, 2025 -
Florence2 stopped working after upgrade to 4.50.0 ("Unrecognized configuration class")
#36886 closed
Mar 25, 2025 -
Design question for integrating new model to Transformers?
#36784 closed
Mar 25, 2025 -
Add seed to data collator classes
#36655 closed
Mar 24, 2025 -
Torch -> ONNX doesn't work after upgrading transformers to 4.49.0
#36276 closed
Mar 24, 2025 -
<spam>
#36924 closed
Mar 24, 2025 -
llama tokenizer encode -> decode is not same
#36325 closed
Mar 24, 2025 -
tj-actions/changed-files action compromised
#36761 closed
Mar 24, 2025 -
Some of test/utils tests fail being invalidated by tests/utils/test_import_utils.py::test_clear_import_cache
#36334 closed
Mar 24, 2025 -
MacOs: register_pytree_node got an unexpected keyword argument 'flatten_with_keys_fn'
#36906 closed
Mar 24, 2025 -
Issue with update
#36888 closed
Mar 24, 2025 -
Trainer: TensorBoardCallback not working for "on_save" and "on_save_end" events
#35612 closed
Mar 24, 2025 -
Pipeline cannot guess which processor to use with Gemma 3
#36911 closed
Mar 23, 2025 -
Unable to export GLM models to ONNX
#35021 closed
Mar 23, 2025 -
`modular_model_converter` can not handle objects import via try - except
#35414 closed
Mar 23, 2025 -
`TFViTModel` and `interpolate_pos_encoding=True`
#36155 closed
Mar 23, 2025 -
[BART] Cannot copy out of meta tensor; no data!
#36247 closed
Mar 21, 2025 -
Bug introduced in `from_pretrained` `v4.48.3`..`v4.49.0`
#36258 closed
Mar 21, 2025 -
<spam>
#36876 closed
Mar 21, 2025 -
Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
#36124 closed
Mar 21, 2025 -
Allow setting a seed for DataCollatorForLanguageModeling
#36357 closed
Mar 20, 2025 -
LlamaAttention has no attribute `rotary_emb` (4.50.0.dev0)
#36758 closed
Mar 20, 2025 -
GPT2 repetition of words in output
#36848 closed
Mar 20, 2025 -
num_items_in_batch unexpected in vision encoder decoder
#36744 closed
Mar 20, 2025 -
Convert RT-DETR model to coreml
#35905 closed
Mar 20, 2025 -
[bug] fast_image_processor register error
#36715 closed
Mar 19, 2025 -
When what needs to be loaded is in the cache directory, there is no need to make a request to the remote
#36762 closed
Mar 19, 2025 -
In the _speculative_sampling function, it seems that the "squeeze" method is being used incorrectly.
#36810 closed
Mar 19, 2025 -
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
#36683 closed
Mar 19, 2025 -
text-to-video_app
#36747 closed
Mar 19, 2025 -
model from_pretrained bug in 4.50.dev0 in these days
#36506 closed
Mar 19, 2025 -
Subtle difference with Pytorch AdamW?
#35504 closed
Mar 19, 2025 -
Qwen2VL exhibits significant performance differences under different attention implementations.
#35749 closed
Mar 19, 2025 -
[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer
#35973 closed
Mar 19, 2025 -
Traning loss not showing with trainer
#36102 closed
Mar 19, 2025 -
Gemma3 minimal fine tuning example?
#36714 closed
Mar 18, 2025 -
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35233 closed
Mar 18, 2025 -
incorrect special_tokens_mask
#35897 closed
Mar 18, 2025 -
Llama tokenizer newline character inconsistency
#35923 closed
Mar 18, 2025 -
flex_attention does not output the full attention_weights with output_attention option
#36096 closed
Mar 18, 2025 -
bug in save checkpoint
#36099 closed
Mar 18, 2025 -
qwen2_5_vl processor padding side is wrong.
#36100 closed
Mar 18, 2025 -
ValueError: weight is on the meta device, we need a `value` to put in on 0. `Gemma3`
#36766 closed
Mar 17, 2025 -
Misleading documentation for `is_decoder` configuration parameter
#36482 closed
Mar 17, 2025 -
On MoE implementation in HuggingFace
#36730 closed
Mar 17, 2025 -
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 closed
Mar 17, 2025 -
Cannot load siglip2 processor
#36665 closed
Mar 16, 2025 -
SFTConfig.__init__() got an unexpected keyword argument 'optimizers'
#36749 closed
Mar 16, 2025 -
Model.generate use_cache=True generates different results than use_cache=False
#36536 closed
Mar 16, 2025 -
past_key_values type support bug
#36057 closed
Mar 16, 2025 -
TypeError: empty() missing 1 required positional arguments: "size"
#36061 closed
Mar 16, 2025 -
Transformers can create unconventional python module names when loading certain repositories
#35570 closed
Mar 15, 2025 -
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36010 closed
Mar 15, 2025 -
[feature request] Callback handler event after forward pass in Trainer
#36012 closed
Mar 15, 2025 -
AMD CI tracking issue
#36019 closed
Mar 15, 2025 -
Issue in resuming finetuning Llama 3.1 Instruct Model
#36035 closed
Mar 15, 2025 -
Initializing via AutoImageProcessor before AutoProcessor is imported causes `AttributeError`
#34307 closed
Mar 14, 2025 -
Trainer sets `state.best_model_checkpoint` even when it doesn't save there; leads to training crash
#35609 closed
Mar 14, 2025 -
'MERTConfig' object has no attribute 'conv_pos_batch_norm'
#36134 closed
Mar 14, 2025 -
Some questions of `Gemma3` processor
#36701 closed
Mar 14, 2025 -
NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend
#36674 closed
Mar 14, 2025 -
Component loading incorrect dtype
#36686 closed
Mar 13, 2025 -
`disable_compile` not honored as a kwarg in generate
#36544 closed
Mar 13, 2025 -
AutoModel failed with empty tensor error
#36579 closed
Mar 13, 2025 -
Some methods in TrainerControl seem not to be utilized.
#36576 closed
Mar 13, 2025 -
save_only_model with FSDP throws FileNotFoundError error
#36626 closed
Mar 13, 2025 -
Cannot import 'GenerationOutput' in 4.48.1
#35957 closed
Mar 13, 2025 -
GPTQ quantization on Jetson Orin Nano
#36139 closed
Mar 12, 2025 -
past_key_values not being set in model_inputs keys
#36001 closed
Mar 12, 2025 -
The number of safetensors files is different when using CPU and CUDA.
#36595 closed
Mar 11, 2025 -
Downloading models in distributed training
#36414 closed
Mar 11, 2025 -
Warning related to torch.tensor() usage in transformers.models.encodec.modeling_encodec.py (Version 4.47.0)
#36533 closed
Mar 11, 2025 -
Loading a pipeline with `trust_remote_code=True` raises warning
#36273 closed
Mar 11, 2025 -
Extract embeddings of many seqs using ESM2
#36641 closed
Mar 11, 2025 -
Error faced during Finetuning Deepseek-vl2
#36633 closed
Mar 11, 2025 -
paligemma2-3B-mix in version4.49.0 not use GPU and 4.50.0.dev broken
#36575 closed
Mar 11, 2025 -
Inconsistent Outputs When Using Flash Attention 2 and SDPA Attention with Attention Mask
#36585 closed
Mar 11, 2025 -
AttributeError: 'MERTConfig' object has no attribute 'conv_pos_batch_norm'
#35656 closed
Mar 10, 2025 -
Why are there so many variables named layrnorm in the codebase?
#36623 closed
Mar 10, 2025 -
Memory Access out of bounds in mra/cuda_kernel.cu::index_max_cuda_kernel()
#35507 closed
Mar 10, 2025 -
Very slow to load deep seekv3 int4 model and device_map="auto" "sequential" bug
#35522 closed
Mar 10, 2025 -
adalomo and deepspeed zero3 offload error
#35977 closed
Mar 10, 2025 -
size mismatch for lm_head when fintune QWEN2.5
#36550 closed
Mar 10, 2025 -
Llama3 tokenizer decode is incorrect for ' ...' with leading space
#36622 closed
Mar 9, 2025 -
Tokenizer does not split text according to newly added input tokens
#35447 closed
Mar 9, 2025 -
Can't use Trainer on mps device
#35954 closed
Mar 9, 2025 -
Significant Increase in Computation Time When Using Attention Mask in SDPA Attention
#36584 closed
Mar 8, 2025 -
Accidentally allocating 2x memory in new caching_allocator_warmup
#36483 closed
Mar 7, 2025 -
Open Object Detection Leaderboard: Model Requests not working
#36034 closed
Mar 7, 2025 -
TypeError: LlavaProcessor: got multiple values for keyword argument 'images'
#36578 closed
Mar 7, 2025 -
Attention can be None in ModernBertForSequenceClassification
#35917 closed
Mar 7, 2025 -
Lora_B weight becomes 0 when using AuotModel
#36594 closed
Mar 6, 2025 -
Do trailing padding tokens get a forward pass?
#36565 closed
Mar 6, 2025 -
Init on meta device and then materialize on gpu leads to very large errors
#36577 closed
Mar 6, 2025 -
how to use transformers with musicgen with float16
#36546 closed
Mar 6, 2025 -
AttributeError: 'dict' object has no attribute '_attn_implementation_internal'
#35900 closed
Mar 6, 2025 -
Dtensor support requires torch>=2.5.1
#36472 closed
Mar 5, 2025 -
Groq inference provider
#36353 closed
Mar 4, 2025 -
After tokenizers upgrade, the length of the token does not correspond to the length of the model
#36532 closed
Mar 4, 2025 -
Incorrect Whisper long-form decoding timestamps
#31942 closed
Mar 4, 2025 -
Help Understanding Beam Search Scores in Hugging Face (LLaMA + LoRA)
#35618 closed
Mar 4, 2025 -
ERROR: Video features and Video Tokens do not match!!!
#35869 closed
Mar 4, 2025 -
tokenizers.apply_chat_template with continue_final_message=True with </think> token
#36440 closed
Mar 3, 2025 -
tokenizers.apply_chat_template with `continue_final_message=True` with trailing spaces in input
#35433 closed
Mar 3, 2025 -
Confusing behavior when loading PEFT models with pipeline
#36473 closed
Mar 3, 2025 -
`_load_state_dict_into_meta_model` - `'NoneType' object has no attribute 'load_state_dict'`
#36495 closed
Mar 3, 2025 -
GRPO Reward Weight Scheduler
#36490 closed
Mar 3, 2025 -
please support register_full_backward_pre_hook and register_full_backward_hook
#36507 closed
Mar 3, 2025 -
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed
Mar 3, 2025 -
[DEV Testing] Issues with `test_modeling_common`
#35857 closed
Mar 3, 2025 -
[BUG]npu zero3 训练自定义模型时,报错Function SumBackward0 returned an invalid gradient at index 0
#36387 closed
Mar 3, 2025 -
Load siglip2 error
#36475 closed
Mar 3, 2025 -
`padding_side` is of type `bool` when it should be `Literal['right', 'left']`
#36252 closed
Mar 3, 2025 -
Add Wan model into Transformers
#36494 closed
Mar 2, 2025 -
Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev`
#36441 closed
Mar 1, 2025 -
`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()`
#35426 closed
Mar 1, 2025 -
AttributeError: 'Config' object has no attribute '_get_non_default_generation_parameters'
#35543 closed
Mar 1, 2025 -
Prompt_ids feature causing repetitions and hallucinations
#35603 closed
Mar 1, 2025 -
convert_llama_weight_to_hf.py
#35820 closed
Mar 1, 2025 -
suppress_tokens=[] should be legal as some older. whisper models rely on this
#36341 closed
Feb 28, 2025 -
model.generate() produces different outputs with padding for flan-t5-small
#36461 closed
Feb 28, 2025 -
Failed to import transformers.models.auto.modeling_auto because numpy.core.multiarray failed to import
#36343 closed
Feb 28, 2025 -
KerasTensor can't be used with TFBertTokenizer
#36462 closed
Feb 28, 2025 -
about siglip2
#36470 closed
Feb 28, 2025 -
AutoModelForObjectDetection isnt working due to wrong output size
#36464 closed
Feb 28, 2025 -
Question for community: We're considering adding `pydantic` as a base requirement to 🤗 `transformers`
#36329 closed
Feb 28, 2025 -
How to change data
#35807 closed
Feb 28, 2025 -
test
#36471 closed
Feb 28, 2025 -
Error splitting the input into NAL units.
#36448 closed
Feb 27, 2025 -
KTransformers跑DeepSeek-R1量化版,支持多并发吗?多并发要的资源是不是也成倍上涨?
#36423 closed
Feb 27, 2025 -
Mamba2 doesn't support Multi-GPU training (fast path)
#35770 closed
Feb 27, 2025 -
TPU Initialization Error with Transformers in Kaggle TPU VM v3-8
#35774 closed
Feb 27, 2025 -
Apply dualpipe from deepseek-v3 to a trainer or model
#36439 closed
Feb 27, 2025
116 Issues opened by 112 people
-
Gemma3 adding new tokens <image_soft_token> has been added accidentally
#37011 opened
Mar 26, 2025 -
[Question] Handling of custom flex attention block masks
#37006 opened
Mar 26, 2025 -
GGUF model with architecture gemma3 is not supported yet.
#37002 opened
Mar 26, 2025 -
Add ArlowGPT
#36988 opened
Mar 26, 2025 -
FSDP Not Working For Mamba2
#36982 opened
Mar 25, 2025 -
[Community contributions] Model cards
#36979 opened
Mar 25, 2025 -
[Contributions Welcome] Add Fast Image Processors
#36978 opened
Mar 25, 2025 -
QuestionAnswering for Gemma 3
#36977 opened
Mar 25, 2025 -
Gemma3: Cuda error: misaligned address
#36961 opened
Mar 25, 2025 -
Symbolic trance with past_key_values input is not supported yet for the qwen2.
#36959 opened
Mar 25, 2025 -
Started getting new warnings for gemma3 after upgrading from 4.49.0-gemma3 to 4.50.0
#36942 opened
Mar 24, 2025 -
Add param_to_hook_all_reduce parameter in HF Trainer
#36941 opened
Mar 24, 2025 -
Gemma3 not supported in main branch
#36940 opened
Mar 24, 2025 -
AttributeError: 'HybridCache' object has no attribute 'float' — PaliGemma2 Evaluation Fails with BF16
#36938 opened
Mar 24, 2025 -
python_interpreter.py seems not support asyncio.run()
#36920 opened
Mar 24, 2025 -
'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50
#36913 opened
Mar 24, 2025 -
`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING`
#36908 opened
Mar 23, 2025 -
PixtralVisionModel does not support Flash Attention 2.0 yet
#36904 opened
Mar 22, 2025 -
Warning: "No label_names provided for PeftModel" persists despite dataset containing "labels" column
#36902 opened
Mar 22, 2025 -
groot n1
#36900 opened
Mar 22, 2025 -
GPT2Model model output inconsistency between different transformers versions
#36897 opened
Mar 22, 2025 -
Forced to hit `UserWarning` when generating with `temperature=0`
#36896 opened
Mar 21, 2025 -
Add RF-DETR model
#36879 opened
Mar 21, 2025 -
Qwen2-VL-7B-Instruct shape error when using TP=4
#36875 opened
Mar 21, 2025 -
Support for SpatialLM series model
#36874 opened
Mar 21, 2025 -
Optimize tokenizer.decode() Performance for `List[int]` Inputs
#36872 opened
Mar 21, 2025 -
Multiple processor classes have input side-effects
#36865 opened
Mar 20, 2025 -
Gemma3 (and Paligemma) position_ids 1-indexed?
#36856 opened
Mar 20, 2025 -
Facing RunTime Attribute error while running different Flax models for RoFormer
#36854 opened
Mar 20, 2025 -
Tansfomers_model
#36846 opened
Mar 20, 2025 -
Unable to load google/siglip2-so400m-patch14-384/
#36845 opened
Mar 20, 2025 -
GOT-OCR2 docs indicate model can produce markdown, but it only produces LaTeX.
#36836 opened
Mar 19, 2025 -
Build for Windows and VS 2022 does not compile CUDA sources
#36830 opened
Mar 19, 2025 -
Support for Ovis2 models
#36824 opened
Mar 19, 2025 -
Gemma 3 is broken with fp16
#36822 opened
Mar 19, 2025 -
Need Option to Disable Flash Attention in VideoLLaMA2.1-7B-AV (SiglipVisionModel)
#36819 opened
Mar 19, 2025 -
Add EuroBert Model To Config
#36817 opened
Mar 19, 2025 -
Gemma3 can't be fine-tuned on multi-image examples
#36816 opened
Mar 19, 2025 -
Gemma3
#36815 opened
Mar 19, 2025 -
Not able to trace GPT2DoubleHeadsModel
#36812 opened
Mar 19, 2025 -
Logic Errors in Image_processing_gemma3_fast.py
#36806 opened
Mar 19, 2025 -
Qwen2VLForConditionalGeneration.from_pretrained() hangs with v0.50.0-dev0
#36803 opened
Mar 18, 2025 -
BERT is broken on `v4.49.0-Gemma-3`
#36802 opened
Mar 18, 2025 -
Throw messages in text-generation task with deepseek r1 with PEFTModel
#36783 opened
Mar 18, 2025 -
Please support GGUF format for UMT5EncoderModel
#36774 opened
Mar 17, 2025 -
Inconsistent Documentation for `dataset_index` Requirement Across ViTPose Models
#36773 opened
Mar 17, 2025 -
Add Audio inputs available in apply_chat_template
#36769 opened
Mar 17, 2025 -
Source link to Ray Tune API outdated
#36765 opened
Mar 17, 2025 -
could not parse ModelProto from /home/imss/zxhhhh/llama-3-8b/tokenizer.model
#36764 opened
Mar 17, 2025 -
Add Gemma 3 For Sequence Classification
#36755 opened
Mar 16, 2025 -
Unable to load google/siglip2-base-patch16-naflex
#36754 opened
Mar 16, 2025 -
IdeficsProcessor cannot handle multiple images in one text
#36751 opened
Mar 16, 2025 -
Gemma 3 1B - TypeError: 'NoneType' object is not callable
#36745 opened
Mar 15, 2025 -
Model Card to include key information (e.g. max_sequence_length, etc.)
#36743 opened
Mar 15, 2025 -
Unable to deploy Gemma 3 on AWS SageMaker due to lack of support in tranfomers release
#36738 opened
Mar 15, 2025 -
Error when tokenizer is set to string: `AttributeError: 'str' object has no attribute 'pad_token_id'`
#36731 opened
Mar 14, 2025 -
`torch.compile` custom backend called by AotAutograd triggers recompiles when used with `CompileConfig`
#36725 opened
Mar 14, 2025 -
trainer.train()
#36723 opened
Mar 14, 2025 -
Add RoMa keypoint matcher
#36718 opened
Mar 14, 2025 -
`return_assistant_tokens_mask` argument is blocked in `ProcessorMixin.apply_chat_template`
#36713 opened
Mar 14, 2025 -
Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...)
#36697 opened
Mar 13, 2025 -
Transformers 4.49.0 breaks nvdiffrast plugin loading
#36676 opened
Mar 12, 2025 -
The parameter 'text' may be None as the comments says, there is a confuse.
#36667 opened
Mar 12, 2025 -
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 opened
Mar 12, 2025 -
Qwen2 MoE manual `head_dim`
#36659 opened
Mar 12, 2025 -
Cannot run backward with tensor parallel
#36657 opened
Mar 12, 2025 -
AutoModel from_pretrained does not recursively download relative imports
#36653 opened
Mar 12, 2025 -
Marian RNN conversion support
#36651 opened
Mar 11, 2025 -
Hybrid models
#36646 opened
Mar 11, 2025 -
[Feature Request]: refactor _update_causal_mask to a public utility
#36640 opened
Mar 11, 2025 -
[BUG] Batch inference DDP + zero stage 3 = inference code hangs
#36638 opened
Mar 11, 2025 -
`output_hidden_states` only return part of hidden_state when setting `device_map="auto"`
#36636 opened
Mar 10, 2025 -
Difficulties with multi-GPU Inferencing
#36634 opened
Mar 10, 2025 -
Add Magma from Microsoft to Transformers
#36629 opened
Mar 10, 2025 -
Unable to use converted Llama 3.3 instruct model
#36628 opened
Mar 10, 2025 -
[deepspeed] any plans for deepspeed-domino?
#36624 opened
Mar 10, 2025 -
Can not use flash-attention and flash-varlen-attention on Ascend NPU
#36618 opened
Mar 9, 2025 -
In "02_how_to_generate", code cell 1 has an error message
#36613 opened
Mar 8, 2025 -
Not installable on arm64 due to jaxlib upper bound
#36611 opened
Mar 7, 2025 -
Making attention mechanism stackable
#36609 opened
Mar 7, 2025 -
Whisper pipeline returns empty segment for each processed audio chunk
#36602 opened
Mar 7, 2025 -
lm_head parameters missing from named_parameters() in Qwen2.5-VL-3B-Instruct model
#36598 opened
Mar 7, 2025 -
Error when changing vocab size when fine tuning llama-vision
#36590 opened
Mar 6, 2025 -
After tokenizers upgrade, the length of the token does not correspond to the length of the model
#36574 opened
Mar 6, 2025 -
txt2vedio
#36573 opened
Mar 6, 2025 -
In the latest version of transformers (4.49.0) matrix transformation error is encountered
#36571 opened
Mar 6, 2025 -
torch_dtype is actually used now?
#36567 opened
Mar 5, 2025 -
Add support for StableAdamW optimizer in Trainer
#36564 opened
Mar 5, 2025 -
Stop output to stdout in streamers.py methods
#36562 opened
Mar 5, 2025 -
Improving expected test results
#36561 opened
Mar 5, 2025 -
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 opened
Mar 5, 2025 -
Error during processing: MllamaForCausalLM does not support Flash Attention 2.0 yet.
#36557 opened
Mar 5, 2025 -
Facing issue while getting model from Rag,pretrained
#36548 opened
Mar 5, 2025 -
Wrong dependency: `"tensorflow-text<2.16"`
#36541 opened
Mar 4, 2025 -
Bug when computing positional IDs from embeddings
#36537 opened
Mar 4, 2025 -
Bug in LlaveNextProcessor when using do_pad=False
#36531 opened
Mar 4, 2025 -
Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining
#36527 opened
Mar 4, 2025 -
GraniteMoE’s implementation is not compatible with HF’s peft
#36518 opened
Mar 3, 2025 -
Object detection tutorial uses buggy dataset, may lead to crash during training
#36516 opened
Mar 3, 2025 -
model.generate function is not compatible with custom position_ids
#36510 opened
Mar 3, 2025 -
Can not use prompt tuning inference
#36509 opened
Mar 3, 2025 -
haggingface model unsupported torch backward hook
#36508 opened
Mar 3, 2025 -
add a param to control cache in streamer when return output
#36505 opened
Mar 3, 2025 -
TypeError: object of type 'IterableDataset' has no len()
#36501 opened
Mar 3, 2025 -
Support Distill Depth Anything
#36499 opened
Mar 2, 2025 -
Error at scatter num_items_in_batch in ddp/dp
#36492 opened
Mar 2, 2025 -
llama code break with torch compile
#36484 opened
Mar 1, 2025 -
Add type checking to CI
#36481 opened
Feb 28, 2025 -
ViTPose tutorial fails
#36454 opened
Feb 27, 2025 -
Error in tiktoken integration example
#36438 opened
Feb 26, 2025 -
Unable to save model after training with tensor parallel
#36436 opened
Feb 26, 2025
176 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Janus model
#36053 commented on
Mar 20, 2025 • 144 new comments -
Add EfficientLoFTR model
#36355 commented on
Mar 19, 2025 • 75 new comments -
Add FAST
#35476 commented on
Mar 26, 2025 • 75 new comments -
Samhq model addition
#35147 commented on
Mar 26, 2025 • 39 new comments -
Add MLCD model
#36182 commented on
Mar 25, 2025 • 33 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Mar 25, 2025 • 29 new comments -
Add StyleTTS 2
#35790 commented on
Mar 19, 2025 • 27 new comments -
Add evolla rebase main
#36232 commented on
Mar 24, 2025 • 19 new comments -
Add DeepSeek V2 Model into Transformers
#36400 commented on
Mar 26, 2025 • 19 new comments -
Add InternVL (2.5 MPO)
#35968 commented on
Mar 25, 2025 • 17 new comments -
Update deprecated/unused dependencies 🧹 🧹
#36419 commented on
Mar 4, 2025 • 14 new comments -
Add padding-free to bamba
#35861 commented on
Mar 14, 2025 • 13 new comments -
Add support for MiniMax's MiniMax-Text-01
#35831 commented on
Mar 20, 2025 • 13 new comments -
Add index selection for `output_hidden_states`
#33705 commented on
Mar 18, 2025 • 13 new comments -
`GPT2Model` StaticCache support
#35761 commented on
Mar 26, 2025 • 12 new comments -
Add Doge model
#35891 commented on
Mar 22, 2025 • 12 new comments -
Introduce modular files for speech models
#35902 commented on
Mar 21, 2025 • 10 new comments -
Add D-FINE Model into Transformers
#36261 commented on
Mar 26, 2025 • 6 new comments -
[MLU] Fix FA2 check error, remove deepspeed-mlu deps.
#36159 commented on
Mar 26, 2025 • 6 new comments -
make `num_items_in_batch` optional in compute_loss_func
#36426 commented on
Mar 26, 2025 • 5 new comments -
Fix Mask2Former Weight Initialization Issues #35877
#35904 commented on
Mar 24, 2025 • 5 new comments -
Add Segment Anything 2 (SAM2)
#32317 commented on
Mar 25, 2025 • 4 new comments -
switch from `training_args.bin` `training_args.json`
#35010 commented on
Mar 10, 2025 • 4 new comments -
Add internlm3 dense
#35694 commented on
Mar 19, 2025 • 4 new comments -
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on
Mar 20, 2025 • 3 new comments -
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on
Mar 25, 2025 • 3 new comments -
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on
Mar 25, 2025 • 2 new comments -
Add LightGlue model
#31718 commented on
Mar 10, 2025 • 2 new comments -
Introduce numpy/numba optimization to `Qwen2VLImageProcessor`
#36356 commented on
Feb 28, 2025 • 2 new comments -
Append best model checkpoint with active adapter when not default
#36201 commented on
Mar 13, 2025 • 2 new comments -
Flash Attention v3
#36190 commented on
Mar 24, 2025 • 1 new comment -
fix: dtype might change during resize
#36089 commented on
Feb 27, 2025 • 1 new comment -
[Whisper] Pipeline: handle long form generation
#35750 commented on
Mar 13, 2025 • 1 new comment -
[WIP]: Base multimodal model for VLLM's `transformers` backend
#36367 commented on
Mar 26, 2025 • 1 new comment -
Integrate xlstm cleanly.
#35377 commented on
Mar 25, 2025 • 1 new comment -
[i18n-zh] Translating kv_cache into zh-hans
#36412 commented on
Feb 27, 2025 • 1 new comment -
enable tp on CPU
#36299 commented on
Mar 26, 2025 • 1 new comment -
Add AIMv2 to Transformers
#35550 commented on
Mar 3, 2025 • 0 new comments -
Process inputs directly in apply_chat_template in image-text-to-text pipeline
#35616 commented on
Mar 24, 2025 • 0 new comments -
[Fix]Integrate LLM generation parameters into the evaluation method
#36416 commented on
Feb 27, 2025 • 0 new comments -
Bart: new cache format
#35314 commented on
Mar 14, 2025 • 0 new comments -
[generation] Support cache-cropping methods
#35591 commented on
Mar 11, 2025 • 0 new comments -
🔴 Video processors as a separate class
#35206 commented on
Mar 3, 2025 • 0 new comments -
Add Relation DETR
#34900 commented on
Mar 20, 2025 • 0 new comments -
Bye bye env vars, keep everything as configs
#34886 commented on
Mar 19, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Mar 21, 2025 • 0 new comments -
[`AutoDocstring`] Based on inspect parsing of the signature
#33771 commented on
Mar 17, 2025 • 0 new comments -
#33512 handle last element out of range error
#33625 commented on
Mar 12, 2025 • 0 new comments -
Update from pretrained error when loading
#33380 commented on
Mar 24, 2025 • 0 new comments -
Adding new zero-shot examples
#32483 commented on
Feb 28, 2025 • 0 new comments -
Skip non-selected experts for mixtral and qwen2_moe
#32429 commented on
Mar 11, 2025 • 0 new comments -
Trainer: add predict with generate
#32346 commented on
Mar 24, 2025 • 0 new comments -
Improve support for image generation with Chameleon & Anole
#32013 commented on
Mar 19, 2025 • 0 new comments -
Support Kosmos-2.5
#31711 commented on
Mar 25, 2025 • 0 new comments -
audio pipeline support for initial_prompt?
#27317 commented on
Mar 26, 2025 • 0 new comments -
warning bug in Qwen2DecoderLayer in transformers ==4.49
#36361 commented on
Mar 26, 2025 • 0 new comments -
The arguments in `utils/modular_model_converter.py` is different from those in docs
#36362 commented on
Mar 26, 2025 • 0 new comments -
目前使用Ktransformers进行DEEPSEEK-R1满血版和4bit量化版模型进行推理,推理速度有多少tokens/s?对应的计算资源配置分别是多少?
#36363 commented on
Mar 26, 2025 • 0 new comments -
Support set non_blocking=True when move data from cpu to gpu
#36408 commented on
Feb 28, 2025 • 0 new comments -
[generate] Run custom generation code from the Hub
#36405 commented on
Feb 27, 2025 • 0 new comments -
Handle DAC conversion when using weight_norm with newer PyTorch versions
#36393 commented on
Mar 2, 2025 • 0 new comments -
Added Cosmos model files
#36389 commented on
Feb 27, 2025 • 0 new comments -
Fix: Use config.use_sliding_window instead of config.sliding_window
#36377 commented on
Mar 21, 2025 • 0 new comments -
chore(qwen2): display warning log only when sliding window attention …
#36316 commented on
Mar 8, 2025 • 0 new comments -
[`ModernBERT`] Never save 'reference_compile' config; should be set based on end user
#36305 commented on
Mar 20, 2025 • 0 new comments -
Update composition flag usage
#36263 commented on
Mar 19, 2025 • 0 new comments -
Add support for DeepseekAI's DeepseekVL
#36248 commented on
Mar 26, 2025 • 0 new comments -
Improvements in attention_forward functions
#36218 commented on
Mar 14, 2025 • 0 new comments -
Fix the eval_use_gather_object flag usage
#36214 commented on
Mar 18, 2025 • 0 new comments -
fix: condition bos_token_id and space as token
#36211 commented on
Mar 19, 2025 • 0 new comments -
Fixed dynamic module import when there is more than one dot in class …
#36198 commented on
Mar 24, 2025 • 0 new comments -
(ugly) Use `parallelism=4` for `check_repository_consistency`
#36197 commented on
Mar 20, 2025 • 0 new comments -
Set evaluation and checkpointing defaults to 'epoch' and reduce loggi…
#36133 commented on
Mar 13, 2025 • 0 new comments -
Add Phi-3.5-vision
#36036 commented on
Mar 14, 2025 • 0 new comments -
[ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification
#35991 commented on
Mar 7, 2025 • 0 new comments -
Idefics: remove double BOS token
#35950 commented on
Mar 11, 2025 • 0 new comments -
[ModernBERT] Add CausalLM functionality to ModernBERT
#35946 commented on
Mar 3, 2025 • 0 new comments -
[WIP] add deepseek-v3
#35926 commented on
Mar 26, 2025 • 0 new comments -
Missing weights not initialized properly #35437
#35913 commented on
Feb 27, 2025 • 0 new comments -
Several fixes related to rotary position embeddings
#35901 commented on
Mar 19, 2025 • 0 new comments -
Adds GGUF support for Gemma models
#35887 commented on
Mar 4, 2025 • 0 new comments -
Add MultipleChoice & QuestionAnswering heads to ModernBERT
#35825 commented on
Mar 3, 2025 • 0 new comments -
Remove head mask in generative models
#35786 commented on
Mar 19, 2025 • 0 new comments -
Add ColQwen2 to 🤗 transformers
#35778 commented on
Mar 26, 2025 • 0 new comments -
fix immediate quantization of the first token in QuantizedCache
#35760 commented on
Mar 20, 2025 • 0 new comments -
Pipeline: fix unnecessary warnings
#35753 commented on
Mar 17, 2025 • 0 new comments -
Mask2former & Maskformer Fast Image Processor
#35685 commented on
Mar 7, 2025 • 0 new comments -
[docs] add return_timestamps=True for Whisper long-form transcription
#35633 commented on
Mar 20, 2025 • 0 new comments -
Problem about using mBART50 for Russian to Chinese translation
#13116 commented on
Mar 7, 2025 • 0 new comments -
LayerDrop broken in various Flax models (Whisper/BART/more...)
#35468 commented on
Mar 8, 2025 • 0 new comments -
`Llama-3.2-11B-Vision-Instruct` (`mllama`) FSDP fails if grad checkpointing is enabled
#36040 commented on
Mar 8, 2025 • 0 new comments -
DeepSeek V3 Support
#35425 commented on
Mar 8, 2025 • 0 new comments -
Unknown quantization type, got fp8
#35471 commented on
Mar 9, 2025 • 0 new comments -
Missing weights are not properly initialized when using model.from_pretrained()
#35437 commented on
Mar 9, 2025 • 0 new comments -
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on
Mar 10, 2025 • 0 new comments -
Inconsistent saving of tokenizer with custom code from HF hub vs. local directory
#35597 commented on
Mar 10, 2025 • 0 new comments -
Mask2FormerImageProcessor support overlapping features
#35536 commented on
Mar 11, 2025 • 0 new comments -
Add the support for deepseek architecture .gguf
#36144 commented on
Mar 13, 2025 • 0 new comments -
FSDP Torch XLA vs. FSDPv2 (SMPD) Torch XLA checkpoint saving bug
#36004 commented on
Mar 13, 2025 • 0 new comments -
`trainer.evaluate` always creates a new MLFlow run, separate from the one used during `train()`
#35074 commented on
Mar 13, 2025 • 0 new comments -
A word-level timestamps on whisper generation pipeline is mismatched to total duration
#36228 commented on
Mar 13, 2025 • 0 new comments -
Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading
#36272 commented on
Mar 13, 2025 • 0 new comments -
XLA FSDP V2 + TPU + T5 Family Models doesn't work
#35142 commented on
Mar 13, 2025 • 0 new comments -
oom when using adafactor optimizer in deepspeed
#33290 commented on
Mar 13, 2025 • 0 new comments -
Request to add DINO object detector
#36205 commented on
Mar 14, 2025 • 0 new comments -
TypeError: ModernBertModel.forward() got an unexpected keyword argument 'num_items_in_batch'
#36074 commented on
Mar 14, 2025 • 0 new comments -
WhisperForCTC
#26242 commented on
Mar 14, 2025 • 0 new comments -
Add support for context parallelism
#35983 commented on
Mar 14, 2025 • 0 new comments -
denoising with sentence permutation, and language sampling
#11129 commented on
Mar 15, 2025 • 0 new comments -
Jitter Noise added to input being passed to experts in Switch Transformers
#33969 commented on
Mar 15, 2025 • 0 new comments -
Support sliding_window for sdpa in qwen2
#36351 commented on
Feb 27, 2025 • 0 new comments -
Add cosmos from Nvidia
#35565 commented on
Feb 27, 2025 • 0 new comments -
add Flash Attention Support for Helsinki-NLP/opus models
#36169 commented on
Feb 28, 2025 • 0 new comments -
[Feature Request] We might need a function to change the sampler used in trainer dataloader
#26802 commented on
Feb 28, 2025 • 0 new comments -
Error From BitsandBytes
#36371 commented on
Feb 28, 2025 • 0 new comments -
Export to ExecuTorch
#32253 commented on
Mar 1, 2025 • 0 new comments -
Speed up image processors - cast to array before BatchFeature
#31205 commented on
Mar 2, 2025 • 0 new comments -
Support SDPA & Flash Attention 2 for LayoutLMv3
#35467 commented on
Mar 2, 2025 • 0 new comments -
Support H100 training with FP8 in Trainer and Deepspeed
#25333 commented on
Mar 2, 2025 • 0 new comments -
SAM mask-generation - crops_n_layers
#35530 commented on
Mar 3, 2025 • 0 new comments -
_batch_encode_plus() got an unexpected keyword argument 'is_pretokenized' using BertTokenizerFast
#17488 commented on
Mar 4, 2025 • 0 new comments -
Add EVEv2 : an Encoder-free VLM
#36379 commented on
Mar 4, 2025 • 0 new comments -
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 commented on
Mar 4, 2025 • 0 new comments -
Add `Tensor Parallel` support for ALL models
#34789 commented on
Mar 4, 2025 • 0 new comments -
Possible bug when using cosine lr scheduler with gradient accumulation
#35484 commented on
Mar 4, 2025 • 0 new comments -
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 commented on
Mar 4, 2025 • 0 new comments -
Add support for Molmo
#33710 commented on
Mar 4, 2025 • 0 new comments -
The output tensor's data type is not torch.long when the input text is empty.
#36277 commented on
Mar 4, 2025 • 0 new comments -
Is T5 model supported with HQQ quantization ? (AttributeError: 'HQQLinear' object has no attribute 'weight')
#36254 commented on
Mar 4, 2025 • 0 new comments -
redirect logging output to `stdout` instead of `stderr`
#34613 commented on
Mar 6, 2025 • 0 new comments -
Enable Quantize KV Cache for Mistral Model
#35041 commented on
Mar 7, 2025 • 0 new comments -
llama `tie_word_embeddings` ignored on cpu and with auto dtype only
#33689 commented on
Mar 7, 2025 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Mar 22, 2025 • 0 new comments -
flash_attention_2 2.7.2.post1 seems to crash when using `torch.compile` and `DataCollatorWithFlattening`
#35588 commented on
Mar 22, 2025 • 0 new comments -
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
#10105 commented on
Mar 23, 2025 • 0 new comments -
DS3 zero3_save_16bit_model is not compatible with resume_from_checkpoint
#36317 commented on
Mar 23, 2025 • 0 new comments -
Bug about num_update_steps_per_epoch in function _inner_training_loop
#36297 commented on
Mar 23, 2025 • 0 new comments -
tensor parallel training bug
#36296 commented on
Mar 23, 2025 • 0 new comments -
safetensor/mmap memory leak when per-layer weights are converted do other dtypes
#34366 commented on
Mar 23, 2025 • 0 new comments -
Add argument to set number of eval steps in Trainer
#31561 commented on
Mar 24, 2025 • 0 new comments -
Assisted generation slower than with base model alone
#36337 commented on
Mar 24, 2025 • 0 new comments -
Unable to use Seq2SeqTrainingArguments and Seq2SeqTrainer
#36330 commented on
Mar 24, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
Mar 24, 2025 • 0 new comments -
Multi-GPU training crashes with IterableDataset and different length input (e.g. Next token prediction)
#35308 commented on
Mar 24, 2025 • 0 new comments -
Inference with FSDP during training affects checkpoints
#34530 commented on
Mar 24, 2025 • 0 new comments -
Mask2Former _init_weights
#35877 commented on
Mar 24, 2025 • 0 new comments -
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on
Mar 25, 2025 • 0 new comments -
ValueError: Trying to set a tensor of shape torch.Size([128256, 3072]) in "weight" (which has shape torch.Size([128003, 3072])), this looks incorrect
#36350 commented on
Mar 25, 2025 • 0 new comments -
Accelerate x Trainer issue tracker:
#33345 commented on
Mar 25, 2025 • 0 new comments -
CVE-2024-11392 - AWS Scanner and Trivy Flagging Transformers 4.48.1 as Vulnerable
#36041 commented on
Mar 25, 2025 • 0 new comments -
multi-gpu: test_model_parallel_beam_search tests fail with "IndexError: list index out of range"
#35824 commented on
Mar 25, 2025 • 0 new comments -
`Helsinki-NLP/opus-mt-it-en` isn't on HuggingFace Hub
#26382 commented on
Mar 25, 2025 • 0 new comments -
AttributeError: 'dict' object has no attribute 'to_dict'; for Inferencing Lora Merged Qwen/Qwen2.5-VL-3B-Instruct
#36281 commented on
Mar 25, 2025 • 0 new comments -
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 commented on
Mar 25, 2025 • 0 new comments -
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on
Mar 16, 2025 • 0 new comments -
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on
Mar 16, 2025 • 0 new comments -
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on
Mar 17, 2025 • 0 new comments -
MultiTask Classification and label_names on Trainer
#33193 commented on
Mar 17, 2025 • 0 new comments -
SDPA `is_causal=False` has no effect due to `LlamaModel._prepare_4d_causal_attention_mask_with_cache_position`
#36150 commented on
Mar 17, 2025 • 0 new comments -
Model trained with Flash Attention 2.0 raises "RuntimeError: query and key must have the same dtype" when generating
#30019 commented on
Mar 18, 2025 • 0 new comments -
Tensor size mismatch when trying to run RT-DETR on multiple gpus
#33165 commented on
Mar 18, 2025 • 0 new comments -
Custom 4D tensor caused shape mismatch error
#35290 commented on
Mar 18, 2025 • 0 new comments -
Cryptic error when using AutoTokenizer with SentencePiece tokenizers without sentencepiece installed
#36291 commented on
Mar 19, 2025 • 0 new comments -
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on
Mar 19, 2025 • 0 new comments -
Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`
#35765 commented on
Mar 19, 2025 • 0 new comments -
`AutoModelForCasualLM.from_pretrained()` exits without warning/error
#36245 commented on
Mar 19, 2025 • 0 new comments -
IsADirectoryError when training with tqdm enabled for trainer
#34766 commented on
Mar 20, 2025 • 0 new comments -
Incompatibility in flash_attention_2 + Llama + Transformers>=4.43 + Autocast to fp16
#36224 commented on
Mar 20, 2025 • 0 new comments -
modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
#36071 commented on
Mar 20, 2025 • 0 new comments -
ValueError: Unrecognized image processor in Qwen/Qwen2.5-VL-3B-Instruct.
#36193 commented on
Mar 21, 2025 • 0 new comments -
cannot import name 'is_timm_config_dict' from 'transformers.utils.generic'
#36068 commented on
Mar 21, 2025 • 0 new comments -
Community contribution: Adding GGUF support for more architectures
#33260 commented on
Mar 21, 2025 • 0 new comments -
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on
Mar 21, 2025 • 0 new comments -
[Bugs] RuntimeError: No CUDA GPUs are available in transformers v4.48.0 or above when running Ray RLHF example
#36295 commented on
Mar 22, 2025 • 0 new comments -
past_key_value(s) name inconsistency causing problems
#36290 commented on
Mar 22, 2025 • 0 new comments -
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 commented on
Mar 22, 2025 • 0 new comments