[vlm] fix loading of retrieval VLMs #39242

zucchini-nlp · 2025-07-07T05:20:44Z

What does this PR do?

As per title, reported internally that slow tests are failing. We need to apply same changes as in VLMs to the models that use VLMs in their architecture

zucchini-nlp · 2025-07-07T05:20:57Z

run-slow: colpali, colqwen2

github-actions · 2025-07-07T05:22:19Z

This comment contains run-slow, running the specified jobs:

models: ['models/colpali', 'models/colqwen2']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-07-07T05:33:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-07-07T08:41:20Z

I wanted to use AutoModel as we shouldn't be loading the lm-head for these model. But the Qwen2-based model was released after refactor, and can work without any conversion_key_mapping currently, and I don't want us to add another key mapping just to use AutoModel instead of AutoModelForImageTextToText

ydshieh

Looks reasonable fix to me (as they seems to apply the same changes made to VLM)

@zucchini-nlp I observed that the 2 model tests fails in 2 different PRs, but maybe they share the same cause so their fix seems identical here?

For context:

colpali tests are failing after

[VLM] Add base model without head (#37033)

And for colqwen2, it fails after

[qwen] refactor attentions for vision/audio (#38930)

and there is a fix [qwen2-vl] fix vision attention scaling #39043, but that one doesn't fix for colqwen

zucchini-nlp · 2025-07-08T06:55:08Z

Hmm, ColQwen for me wasn't failing in a sense that the weights matched when laoding. But the tensors aren't close enough even after model was released. I can check out on runners and see what's the issue.

ColQwen shouldn't have the same issue, it was released after the major refactor

ydshieh · 2025-07-08T08:34:12Z

Hi, sorry, I think my memory got messed

#39043 (comment)

So that issue was already fixed, but my brain wasn't yet.

github-actions · 2025-07-08T11:22:42Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: colpali, colqwen2

fix vlm with retrieval

aae8841

we can't use AutoModel because new ColQwen was released after refactor

e61be9f

zucchini-nlp requested review from ydshieh and Cyrilvallez and removed request for ydshieh July 7, 2025 08:38

no need for colqwen

c7050a1

ydshieh approved these changes Jul 7, 2025

View reviewed changes

zucchini-nlp added 4 commits July 8, 2025 11:30

tied weight keys are necessary, if using IMageTextToText

4170e2f

Merge remote-tracking branch 'upstream/main' into fix-rerieval-vlm

fa66afa

need to apply renaming in tied weights, only for ColPali

6b76b16

Merge branch 'main' into fix-rerieval-vlm

f8d6dcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vlm] fix loading of retrieval VLMs #39242

[vlm] fix loading of retrieval VLMs #39242

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

ydshieh left a comment

Uh oh!

zucchini-nlp commented Jul 8, 2025

Uh oh!

ydshieh commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Uh oh!

[vlm] fix loading of retrieval VLMs #39242

Are you sure you want to change the base?

[vlm] fix loading of retrieval VLMs #39242

Uh oh!

Conversation

zucchini-nlp commented Jul 7, 2025

What does this PR do?

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jul 8, 2025

Uh oh!

ydshieh commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

Uh oh!