Deprecate AutoModelForVision2Seq #38900

zucchini-nlp · 2025-06-19T06:44:46Z

What does this PR do?

As per title, we'll remove it anyway in the end so let's start raising warnings. I already ask all new models to not use Vision2Seq

HuggingFaceDocBuilderDev · 2025-06-19T06:57:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2025-07-07T08:41:32Z

I don't remember all the discussions around this, but is ImageTextToText what will be fully future-proof? So we don't go with ForCausalLM in the end?

zucchini-nlp · 2025-07-07T08:57:27Z

I don't think we'll remove ImageTextToText in the near future though there were discussions on unifying "AutoForMultimodal" under one umbrella mapping. didn't yet start working on new auto class, but I think it will be a new auto-mapping that:

Unifies all generative model, either under one class or two classes (one for text and other for multimodality)
Main goal is to clean up garbage repetitive code and we don't plan to delete existing auto classes, otherwise it might break a lot of external libraries that depend on us. Until new auto is added, we'll have a surge of new VLMs added to the mapping which makes it harder to just remove it

Personally, I'm in favor of keeping ImageTextToText for a while as the recommended mapping

Cyrilvallez

LGTM!

github-actions · 2025-07-14T09:27:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

deprecate vision2seq

deprecate vision2seq

3df0f38

zucchini-nlp requested a review from ArthurZucker June 19, 2025 06:45

zucchini-nlp added 2 commits June 20, 2025 09:04

Merge branch 'main' into deprecate-vision2seq

ddeae3e

Merge branch 'main' into deprecate-vision2seq

c1a581b

zucchini-nlp requested a review from Cyrilvallez July 1, 2025 12:54

zucchini-nlp added 2 commits July 1, 2025 14:54

Merge branch 'main' into deprecate-vision2seq

ee94607

Merge branch 'main' into deprecate-vision2seq

fbe3d6f

zucchini-nlp mentioned this pull request Jul 7, 2025

Fix errors when use verl to train GLM4.1v model #39199

Merged

5 tasks

kaln27 mentioned this pull request Jul 11, 2025

[misc] fix: Use AutoModelForImageTextToText instead of AutoModelForVision2Seq volcengine/verl#2475

Open

Cyrilvallez approved these changes Jul 14, 2025

View reviewed changes

Merge branch 'main' into deprecate-vision2seq

82df709

zucchini-nlp merged commit 878d60a into huggingface:main Jul 14, 2025
25 checks passed

rjgleaton pushed a commit to rjgleaton/transformers that referenced this pull request Jul 17, 2025

Deprecate AutoModelForVision2Seq (huggingface#38900)

2d9d8c6

deprecate vision2seq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate AutoModelForVision2Seq #38900

Deprecate AutoModelForVision2Seq #38900

Uh oh!

zucchini-nlp commented Jun 19, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 19, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Deprecate AutoModelForVision2Seq #38900

Deprecate AutoModelForVision2Seq #38900

Uh oh!

Conversation

zucchini-nlp commented Jun 19, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 19, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!