Skip to content

Create and Expose SamVisionModel as public for better accessibility #36493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Mar 31, 2025

Conversation

geetu040
Copy link
Contributor

@geetu040 geetu040 commented Mar 2, 2025

What does this PR do?

This PR makes SamVisionEncoder publicly accessible while keeping its name unchanged. This improves usability by allowing users to instantiate the vision encoder directly, similar to how SamModel is used.

from transformers import SamVisionConfig, SamVisionEncoder
config = SamVisionConfig()
model = SamVisionEncoder(config)

Previously discussed here: #36248 (comment)

Before submitting

Who can review?

@amyeroberts, @qubvel, @zucchini-nlp

@github-actions github-actions bot marked this pull request as draft March 2, 2025 06:17
Copy link
Contributor

github-actions bot commented Mar 2, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 2, 2025

Hi @qubvel and @zucchini-nlp

I have updated SamVisionEncoder to make it public, but I am having a problem making TFSamVisionEncoder public.

First, I am not even sure, if it is necessary to have TFSamVisionEncoder public, I am just following the test utils/check_repo.py which fails to find TFSamVisionEncoder

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.

Changing the name would break backward compatibility, we can either:

  1. completely disregard the idea of making SamVisionEncoder public and use modular where it is needed as backbone (in context of Deepseek-VL #36248).
  2. make SamVisionEncoder public and leave TFSamVisionEncoder as it as, and exempt this from relevant tests.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall!

For the TF model, you mean making the model a PreTrainedModel is causing issues or just making it public (i.e. adding in high level init)? I am not very familiar with TF models in transformers, but I think we can figure out a way. Maybe @qubvel has more expertise on that?

Keeping SamVIsionEncoder as pre-trained model is important, if we want to use it as vision backbone with VLMS. Especially useful for setting different load-time configuration for each backbone.

@geetu040 geetu040 marked this pull request as ready for review March 4, 2025 01:33
@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2025

@zucchini-nlp

I have added TFSamVisionEncoder to __init__, that fixes the failing check.

Though TFSamVisionEncoder still doesnot inherit TFSamPreTrainedModel and therefore is not added to auto modeling.
Fails to inherit because #36493 (comment)

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.


And by the way SamVisionEncoder works all fine. To conclude everything:

SamVisionEncoder

  • imported in __init__, available via from transformers import SamVisionEncoder
  • part of auto modeling
  • imports PreTrainedModel
  • available in docs

TFSamVisionEncoder

  • imported in __init__, available via from transformers import TFSamVisionEncoder
  • not part of auto modeling
  • doesnot import PreTrainedModel
  • not available in docs

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2025

The failing checks at ci/circleci: tests_torch are fixed in #36422.

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @geetu040, thanks for working on the PR, I suppose to make it public we should add a VisionModel class for consistency with other models in the repo, see CLIP / SigLIP for example

@@ -135,7 +135,7 @@ def __init__(

class SamVisionConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`SamVisionModel`]. It is used to instantiate a SAM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have it named SamVisionModel for consistency, for example see CLIP/Siglip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we should add a SamVisionModel class which will be a subclass of PreTrainedModel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel you are suggesting to rename SamVisionEncoder to SamVisionModel right? I hope it doesnot break backward compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it breaks BC if we just rename it, for people who were loading the vision encoder class directly. We can either do something like SamVisionEncoder = SamVisionModel or add it separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, renaming is going to be too breaking, it's better to add it as a separate class

@geetu040 geetu040 changed the title Make SamVisionEncoder public for better accessibility Create and Expose SamVisionModel as public for better accessibility Mar 17, 2025
@geetu040 geetu040 requested a review from qubvel March 17, 2025 17:20
@geetu040
Copy link
Contributor Author

@qubvel
SamVisionModel added alongside SamVisionEncoder

@geetu040 geetu040 force-pushed the sam-vision-encoder branch from 756f6f7 to a1652c6 Compare March 18, 2025 05:42
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, just a small nit to make it simpler and reuse existing code

@geetu040 geetu040 requested a review from qubvel March 18, 2025 09:18
@geetu040
Copy link
Contributor Author

ping @qubvel for review

@geetu040
Copy link
Contributor Author

Hello @qubvel, a soft reminder for review

@qubvel
Copy link
Member

qubvel commented Mar 26, 2025

Thanks for the ping @geetu040, will review today

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good to me! Thanks for working on it.

P.S. Test failures are unrelated.

cc @ArthurZucker to confirm and merge

@qubvel qubvel requested a review from ArthurZucker March 27, 2025 14:55
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🤗

@ArthurZucker ArthurZucker merged commit 0710e9b into huggingface:main Mar 31, 2025
18 checks passed
@ArthurZucker ArthurZucker mentioned this pull request Mar 31, 2025
5 tasks
zhouksh pushed a commit to zhouksh/transformers that referenced this pull request Mar 31, 2025
…uggingface#36493)

* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
zhouksh pushed a commit to zhouksh/transformers that referenced this pull request Apr 1, 2025
…uggingface#36493)

* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
dmdaksh pushed a commit to dmdaksh/transformers that referenced this pull request Apr 2, 2025
…uggingface#36493)

* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
zucchini-nlp pushed a commit to BakerBunker/transformers that referenced this pull request Apr 2, 2025
…uggingface#36493)

* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
…uggingface#36493)

* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants