-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create and Expose SamVisionModel as public for better accessibility #36493
base: main
Are you sure you want to change the base?
Conversation
This reverts commit d2a4083.
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
Hi @qubvel and @zucchini-nlp I have updated First, I am not even sure, if it is necessary to have
Changing the name would break backward compatibility, we can either:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall!
For the TF model, you mean making the model a PreTrainedModel
is causing issues or just making it public (i.e. adding in high level init)? I am not very familiar with TF models in transformers, but I think we can figure out a way. Maybe @qubvel has more expertise on that?
Keeping SamVIsionEncoder
as pre-trained model is important, if we want to use it as vision backbone with VLMS. Especially useful for setting different load-time configuration for each backbone.
I have added Though
And by the way
|
The failing checks at ci/circleci: tests_torch are fixed in #36422. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM and can be merged after one small iteration. Left a tiny comment. Also I see test_attention_outputs
is failing for Sam Models currently, can you take a look at that?
They fail because of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Then we'll wait until the blocking PR is merged to make CI green and can merge this one
@geetu040 can you rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @geetu040, thanks for working on the PR, I suppose to make it public we should add a VisionModel class for consistency with other models in the repo, see CLIP / SigLIP for example
@@ -135,7 +135,7 @@ def __init__( | |||
|
|||
class SamVisionConfig(PretrainedConfig): | |||
r""" | |||
This is the configuration class to store the configuration of a [`SamVisionModel`]. It is used to instantiate a SAM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have it named SamVisionModel
for consistency, for example see CLIP/Siglip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus we should add a SamVisionModel
class which will be a subclass of PreTrainedModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qubvel you are suggesting to rename SamVisionEncoder
to SamVisionModel
right? I hope it doesnot break backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it breaks BC if we just rename it, for people who were loading the vision encoder class directly. We can either do something like SamVisionEncoder = SamVisionModel
or add it separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, renaming is going to be too breaking, it's better to add it as a separate class
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
@qubvel |
756f6f7
to
a1652c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, just a small nit to make it simpler and reuse existing code
ping @qubvel for review |
Hello @qubvel, a soft reminder for review |
Thanks for the ping @geetu040, will review today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looks good to me! Thanks for working on it.
P.S. Test failures are unrelated.
cc @ArthurZucker to confirm and merge
What does this PR do?
This PR makes
SamVisionEncoder
publicly accessible while keeping its name unchanged. This improves usability by allowing users to instantiate the vision encoder directly, similar to howSamModel
is used.Previously discussed here: #36248 (comment)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts, @qubvel, @zucchini-nlp