Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding DPT #1079

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

vedantdalimkar
Copy link

@vedantdalimkar vedantdalimkar commented Mar 2, 2025

Tried to address issue #1073

  1. Added a TimmViTEncoder class in the encoder package to support ViT models for encoder.
  2. Added DPT model architecture.

I have 1 concern: Right now the model only works for images having the same resolution as the original input resolution on which a particular ViT encoder was trained. Is this behaviour okay or should I change the functionality so that a dynamic image resolution is supported?

I have tested various ViT encoders at different encoder depths and the model seems to run correctly.
@qubvel Please let me know if you feel I should make any changes.

@qubvel
Copy link
Collaborator

qubvel commented Mar 3, 2025

Hi @vedantdalimkar. Thanks a lot for the PR 🤗 I am a bit too busy this week, so will review it early next week. Thanks for your patience. Meanwhile, can you please set up tests for the encoder and the DPT model? Please see how other models are tested

@vedantdalimkar
Copy link
Author

Hi @vedantdalimkar. Thanks a lot for the PR 🤗 I am a bit too busy this week, so will review it early next week. Thanks for your patience. Meanwhile, can you please set up tests for the encoder and the DPT model? Please see how other models are tested

Hi @qubvel. Sure, I will set up the relevant tests.

@vedantdalimkar
Copy link
Author

vedantdalimkar commented Mar 8, 2025

Hi @qubvel, did some refactoring. This commit should pass majority of the tests now. Had a few questions -

  1. The DPT model doesn't seem to torch compilable and scriptable since it has graph breaks. Should I skip those tests?
  2. Currently, the DPT model is not on HF hub which is required for the test_preserve_forward_output test, should I skip this test as well?

Some issues I faced with the default environment for the smp development.

  1. The TimmViTEncoder class that I have added requires the latest version of timm, so I have decorated all test functions with requires_timm_greater_or_equal function (similar to requires_torch_greater_or_equal). If possible, please use the latest timm version in the requirements so that you can run these tests or let me know if you want me to change this behaviour.
  2. The GeLU activation function used in the decoder requires torch version >= 2.0. However, the requirements have a torch version lesser than this. Can the requirements be updated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants