Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and Expose SamVisionModel as public for better accessibility #36493

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

geetu040
Copy link
Contributor

@geetu040 geetu040 commented Mar 2, 2025

What does this PR do?

This PR makes SamVisionEncoder publicly accessible while keeping its name unchanged. This improves usability by allowing users to instantiate the vision encoder directly, similar to how SamModel is used.

from transformers import SamVisionConfig, SamVisionEncoder
config = SamVisionConfig()
model = SamVisionEncoder(config)

Previously discussed here: #36248 (comment)

Before submitting

Who can review?

@amyeroberts, @qubvel, @zucchini-nlp

@github-actions github-actions bot marked this pull request as draft March 2, 2025 06:17
Copy link

github-actions bot commented Mar 2, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 2, 2025

Hi @qubvel and @zucchini-nlp

I have updated SamVisionEncoder to make it public, but I am having a problem making TFSamVisionEncoder public.

First, I am not even sure, if it is necessary to have TFSamVisionEncoder public, I am just following the test utils/check_repo.py which fails to find TFSamVisionEncoder

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.

Changing the name would break backward compatibility, we can either:

  1. completely disregard the idea of making SamVisionEncoder public and use modular where it is needed as backbone (in context of Deepseek-VL #36248).
  2. make SamVisionEncoder public and leave TFSamVisionEncoder as it as, and exempt this from relevant tests.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall!

For the TF model, you mean making the model a PreTrainedModel is causing issues or just making it public (i.e. adding in high level init)? I am not very familiar with TF models in transformers, but I think we can figure out a way. Maybe @qubvel has more expertise on that?

Keeping SamVIsionEncoder as pre-trained model is important, if we want to use it as vision backbone with VLMS. Especially useful for setting different load-time configuration for each backbone.

@geetu040 geetu040 marked this pull request as ready for review March 4, 2025 01:33
@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2025

@zucchini-nlp

I have added TFSamVisionEncoder to __init__, that fixes the failing check.

Though TFSamVisionEncoder still doesnot inherit TFSamPreTrainedModel and therefore is not added to auto modeling.
Fails to inherit because #36493 (comment)

TFSamVisionEncoder saves the vision layers in attribute self.layers, this works fine when TFSamVisionEncoder inherits from keras.layers.Layer but fails when inherited from TFPreTrainedModel which inherits from keras.models.Model. Because keras.models.Model already has layers as a read-only-property.


And by the way SamVisionEncoder works all fine. To conclude everything:

SamVisionEncoder

  • imported in __init__, available via from transformers import SamVisionEncoder
  • part of auto modeling
  • imports PreTrainedModel
  • available in docs

TFSamVisionEncoder

  • imported in __init__, available via from transformers import TFSamVisionEncoder
  • not part of auto modeling
  • doesnot import PreTrainedModel
  • not available in docs

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2025

The failing checks at ci/circleci: tests_torch are fixed in #36422.

@geetu040 geetu040 requested a review from zucchini-nlp March 7, 2025 06:18
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM and can be merged after one small iteration. Left a tiny comment. Also I see test_attention_outputs is failing for Sam Models currently, can you take a look at that?

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 7, 2025

Also I see test_attention_outputs is failing for Sam Models currently, can you take a look at that?

They fail because of the SamVisionSdpaAttention which is fixed in #36422

@geetu040 geetu040 requested a review from zucchini-nlp March 7, 2025 09:46
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Then we'll wait until the blocking PR is merged to make CI green and can merge this one

@zucchini-nlp
Copy link
Member

@geetu040 can you rebase main, the blocker was merged already! I request review from qubvel, so we don't wait for core maintainer and unblock your for DeekSeek

@zucchini-nlp zucchini-nlp requested a review from qubvel March 17, 2025 10:14
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @geetu040, thanks for working on the PR, I suppose to make it public we should add a VisionModel class for consistency with other models in the repo, see CLIP / SigLIP for example

@@ -135,7 +135,7 @@ def __init__(

class SamVisionConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`SamVisionModel`]. It is used to instantiate a SAM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have it named SamVisionModel for consistency, for example see CLIP/Siglip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we should add a SamVisionModel class which will be a subclass of PreTrainedModel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel you are suggesting to rename SamVisionEncoder to SamVisionModel right? I hope it doesnot break backward compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it breaks BC if we just rename it, for people who were loading the vision encoder class directly. We can either do something like SamVisionEncoder = SamVisionModel or add it separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, renaming is going to be too breaking, it's better to add it as a separate class

@geetu040 geetu040 changed the title Make SamVisionEncoder public for better accessibility Create and Expose SamVisionModel as public for better accessibility Mar 17, 2025
@geetu040 geetu040 requested a review from qubvel March 17, 2025 17:20
@geetu040
Copy link
Contributor Author

@qubvel
SamVisionModel added alongside SamVisionEncoder

@geetu040 geetu040 force-pushed the sam-vision-encoder branch from 756f6f7 to a1652c6 Compare March 18, 2025 05:42
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, just a small nit to make it simpler and reuse existing code

@geetu040 geetu040 requested a review from qubvel March 18, 2025 09:18
@geetu040
Copy link
Contributor Author

ping @qubvel for review

@geetu040
Copy link
Contributor Author

Hello @qubvel, a soft reminder for review

@qubvel
Copy link
Member

qubvel commented Mar 26, 2025

Thanks for the ping @geetu040, will review today

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good to me! Thanks for working on it.

P.S. Test failures are unrelated.

cc @ArthurZucker to confirm and merge

@qubvel qubvel requested a review from ArthurZucker March 27, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants