Create and Expose SamVisionModel as public for better accessibility #36493

geetu040 · 2025-03-02T06:17:35Z

What does this PR do?

This PR makes SamVisionEncoder publicly accessible while keeping its name unchanged. This improves usability by allowing users to instantiate the vision encoder directly, similar to how SamModel is used.

from transformers import SamVisionConfig, SamVisionEncoder
config = SamVisionConfig()
model = SamVisionEncoder(config)

Previously discussed here: #36248 (comment)

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts, @qubvel, @zucchini-nlp

This reverts commit d2a4083.

github-actions · 2025-03-02T06:17:47Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

geetu040 · 2025-03-02T08:32:15Z

Hi @qubvel and @zucchini-nlp

I have updated SamVisionEncoder to make it public, but I am having a problem making TFSamVisionEncoder public.

First, I am not even sure, if it is necessary to have TFSamVisionEncoder public, I am just following the test utils/check_repo.py which fails to find TFSamVisionEncoder

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.

Changing the name would break backward compatibility, we can either:

completely disregard the idea of making SamVisionEncoder public and use modular where it is needed as backbone (in context of Deepseek-VL #36248).
make SamVisionEncoder public and leave TFSamVisionEncoder as it as, and exempt this from relevant tests.

zucchini-nlp

LGTM overall!

For the TF model, you mean making the model a PreTrainedModel is causing issues or just making it public (i.e. adding in high level init)? I am not very familiar with TF models in transformers, but I think we can figure out a way. Maybe @qubvel has more expertise on that?

Keeping SamVIsionEncoder as pre-trained model is important, if we want to use it as vision backbone with VLMS. Especially useful for setting different load-time configuration for each backbone.

src/transformers/models/auto/modeling_auto.py

tests/models/sam/test_modeling_sam.py

geetu040 · 2025-03-04T01:38:47Z

@zucchini-nlp

I have added TFSamVisionEncoder to __init__, that fixes the failing check.

Though TFSamVisionEncoder still doesnot inherit TFSamPreTrainedModel and therefore is not added to auto modeling.
Fails to inherit because #36493 (comment)

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.

And by the way SamVisionEncoder works all fine. To conclude everything:

SamVisionEncoder

imported in __init__, available via from transformers import SamVisionEncoder
part of auto modeling
imports PreTrainedModel
available in docs

TFSamVisionEncoder

imported in __init__, available via from transformers import TFSamVisionEncoder
not part of auto modeling
doesnot import PreTrainedModel
not available in docs

geetu040 · 2025-03-04T01:51:59Z

The failing checks at ci/circleci: tests_torch are fixed in #36422.

qubvel

Hey @geetu040, thanks for working on the PR, I suppose to make it public we should add a VisionModel class for consistency with other models in the repo, see CLIP / SigLIP for example

qubvel · 2025-03-17T10:52:33Z

src/transformers/models/sam/configuration_sam.py

@@ -135,7 +135,7 @@ def __init__(

 class SamVisionConfig(PretrainedConfig):
    r"""
-    This is the configuration class to store the configuration of a [`SamVisionModel`]. It is used to instantiate a SAM


We should have it named SamVisionModel for consistency, for example see CLIP/Siglip

Thus we should add a SamVisionModel class which will be a subclass of PreTrainedModel

@qubvel you are suggesting to rename SamVisionEncoder to SamVisionModel right? I hope it doesnot break backward compatibility.

+1, it breaks BC if we just rename it, for people who were loading the vision encoder class directly. We can either do something like SamVisionEncoder = SamVisionModel or add it separately

Yes, renaming is going to be too breaking, it's better to add it as a separate class

tests/models/sam/test_modeling_sam.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

geetu040 · 2025-03-17T17:34:50Z

@qubvel
SamVisionModel added alongside SamVisionEncoder

qubvel

Thanks for the update, just a small nit to make it simpler and reuse existing code

src/transformers/models/sam/modeling_sam.py

geetu040 · 2025-03-19T15:19:11Z

ping @qubvel for review

geetu040 · 2025-03-26T06:30:39Z

Hello @qubvel, a soft reminder for review

qubvel · 2025-03-26T10:18:20Z

Thanks for the ping @geetu040, will review today

qubvel

Ok, looks good to me! Thanks for working on it.

P.S. Test failures are unrelated.

cc @ArthurZucker to confirm and merge

ArthurZucker

Thanks 🤗

…uggingface#36493) * move encoder below * auto modeling * write SamVisionTester * fix vision attention shape * fix SamVisionTest * minor changes to SamVisionTest * Revert "fix vision attention shape" This reverts commit d2a4083. * fix attention output shape in new tests * remove encoder examples * run modular on got_ocr2 * code formatting * fix got_ocr2 * ruff fixes * code quality * add sam_vision in auto modeling and auto configuration * remove composite test * updated index.md * add TFSamVisionEncoder to __init__ * fix public TFSamVisionEncoder * remove outdated todo comment * set test_torch_exportable Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * rename: VisionEncoder -> VisionModel * bring back original SamVisionEncoder * rename back: VisionEncoderOutput -> VisionModelOutput * undo changes in SamModelTester * reuse SamVisionEncoder in SamVisionModel --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

geetu040 added 9 commits February 25, 2025 08:17

move encoder below

24f8f6f

auto modeling

7d60223

Merge branch 'main' into sam-vision-encoder

60d77b3

write SamVisionTester

6a65a88

fix vision attention shape

d2a4083

fix SamVisionTest

1cdf8c7

minor changes to SamVisionTest

61626ca

Revert "fix vision attention shape"

a1258f3

This reverts commit d2a4083.

fix attention output shape in new tests

cd42ffb

github-actions bot marked this pull request as draft March 2, 2025 06:17

geetu040 added 5 commits March 2, 2025 11:31

remove encoder examples

2af72b5

run modular on got_ocr2

bdac520

code formatting

d5ff273

fix got_ocr2

9ae0b98

ruff fixes

a4a60fc

code quality

6934b1c

zucchini-nlp reviewed Mar 3, 2025

View reviewed changes

src/transformers/models/auto/modeling_auto.py Outdated Show resolved Hide resolved

tests/models/sam/test_modeling_sam.py Outdated Show resolved Hide resolved

geetu040 mentioned this pull request Mar 3, 2025

🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings #36422

Merged

geetu040 added 5 commits March 3, 2025 20:18

add sam_vision in auto modeling and auto configuration

aaf6c53

remove composite test

760d2d2

updated index.md

5336bac

Merge branch "main" and resolve conflicts

d0e7d18

add TFSamVisionEncoder to __init__

1766a0f

geetu040 marked this pull request as ready for review March 4, 2025 01:33

geetu040 added 2 commits March 7, 2025 09:18

Merge remote-tracking branch 'origin/main' into sam-vision-encoder

06cce05

fix public TFSamVisionEncoder

88ea9d4

qubvel reviewed Mar 17, 2025

View reviewed changes

geetu040 and others added 6 commits March 17, 2025 17:59

Merge branch 'main' into sam-vision-encoder

f791dc4

set test_torch_exportable

f148ff3

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

rename: VisionEncoder -> VisionModel

caca906

bring back original SamVisionEncoder

6ed683e

rename back: VisionEncoderOutput -> VisionModelOutput

159022f

undo changes in SamModelTester

a1652c6

geetu040 changed the title ~~Make SamVisionEncoder public for better accessibility~~ Create and Expose SamVisionModel as public for better accessibility Mar 17, 2025

geetu040 requested a review from qubvel March 17, 2025 17:20

geetu040 force-pushed the sam-vision-encoder branch from 756f6f7 to a1652c6 Compare March 18, 2025 05:42

Merge branch 'main' into sam-vision-encoder

08e1e5d

qubvel reviewed Mar 18, 2025

View reviewed changes

src/transformers/models/sam/modeling_sam.py Outdated Show resolved Hide resolved

reuse SamVisionEncoder in SamVisionModel

51de71d

geetu040 requested a review from qubvel March 18, 2025 09:18

Merge branch 'main' into sam-vision-encoder

26af99c

qubvel approved these changes Mar 27, 2025

View reviewed changes

Merge branch 'main' into sam-vision-encoder

64080f4

qubvel requested a review from ArthurZucker March 27, 2025 14:55

ArthurZucker approved these changes Mar 31, 2025

View reviewed changes

ArthurZucker merged commit 0710e9b into huggingface:main Mar 31, 2025
18 checks passed

ArthurZucker mentioned this pull request Mar 31, 2025

Add Granite Speech Support #36801

Merged

5 tasks

Create and Expose SamVisionModel as public for better accessibility #36493

Create and Expose SamVisionModel as public for better accessibility #36493

Uh oh!

Conversation

geetu040 commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

geetu040 commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

geetu040 commented Mar 4, 2025

Uh oh!

geetu040 commented Mar 4, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

qubvel Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

qubvel Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

geetu040 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

qubvel Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

geetu040 commented Mar 17, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

geetu040 commented Mar 19, 2025

Uh oh!

geetu040 commented Mar 26, 2025

Uh oh!

qubvel commented Mar 26, 2025

Uh oh!

qubvel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

geetu040 commented Mar 2, 2025 •

edited

Loading

geetu040 commented Mar 2, 2025 •

edited

Loading

qubvel left a comment •

edited

Loading