Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mllama not supported by AutoModelForCausalLM after updating transformers to 4.50.0 #36926

Closed
2 of 4 tasks
WuHaohui1231 opened this issue Mar 24, 2025 · 2 comments · Fixed by #36917
Closed
2 of 4 tasks
Labels

Comments

@WuHaohui1231
Copy link

WuHaohui1231 commented Mar 24, 2025

System Info

  • transformers version: 4.50.0
  • Platform: Linux-5.15.0-100-generic-x86_64-with-glibc2.35
  • Python version: 3.12.2
  • Huggingface_hub version: 0.29.3
  • Safetensors version: 0.5.3
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA A40

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

  1. Install latest version of transformers (4.50.0)
  2. Run the following:
from transformers import AutoModelForCausalLM
model_name = "meta-llama/Llama-3.2-11B-Vision"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)

Got the error:

ValueError: Unrecognized configuration class <class 'transformers.models.mllama.configuration_mllama.MllamaTextConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DiffLlamaConfig, ElectraConfig, Emu3Config, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, GitConfig, GlmConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeSharedConfig, HeliumConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig, Zamba2Config.

However, it's mentioned in the latest document that the mllama model is supported
https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM.from_pretrained

I tested this in an environment with transformers==4.49.0 and the model is loaded without issue

Expected behavior

The multimodal mllama model (Llama-3.2-11B-Vision) is loaded successfully

@zucchini-nlp
Copy link
Member

Will be fixed by #36917

BTW, as a side note, loading a multimodal model should be done with AutoModelForImageTextToText, unless we want to load only the language model part. In the future we will restrict AutoModelForCausalLM to load only the LM backbone. In case of Mllama I think it was loading the text backbone anyway :)

@WuHaohui1231
Copy link
Author

Will be fixed by #36917

BTW, as a side note, loading a multimodal model should be done with AutoModelForImageTextToText, unless we want to load only the language model part. In the future we will restrict AutoModelForCausalLM to load only the LM backbone. In case of Mllama I think it was loading the text backbone anyway :)

Thanks for the prompt reply and the note :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants