Skip to content

Add MLCD model #36182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
Apr 15, 2025
Merged

Add MLCD model #36182

merged 65 commits into from
Apr 15, 2025

Conversation

tanhuajie
Copy link
Contributor

@tanhuajie tanhuajie commented Feb 13, 2025

What does this PR do?

This PR adds MLCD model from DeepGlint-AI Team.

Fixes #36181

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts @qubvel @ArthurZucker

Quick Test

from transformers import AutoProcessor, MLCDVisionModel
from PIL import Image
import requests

# Load model and processor
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-336")
processor = AutoProcessor.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-336")

# Process single image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

# Get visual features
outputs = model(**inputs)
features = outputs.last_hidden_state

print(f"Extracted features shape: {features.shape}")

@tanhuajie
Copy link
Contributor Author

image

Hi, Pavel @qubvel. Hope this message finds you well. It appears that our PR failed on the final step during CI. From the error messages, it looks like the issue stems from the Rt-Detr tests rather than problems within our code. Could you please guide us on how to disable or skip the tests for other models so we can successfully complete the CI process for our PR? Thanks!

@anxiangsir
Copy link

anxiangsir commented Feb 14, 2025

Hi Pavel and Arthur, @qubvel @Rocketknight1 :

Could you please take a moment to review the pull request? Your insights would be immensely appreciated and would greatly contribute to ensuring the quality of the changes. We're truly grateful for your help!

Thank you so much!

@Rocketknight1
Copy link
Member

Gentle ping @qubvel, but let me know if you want me to take any part of the review!

@qubvel qubvel self-requested a review February 26, 2025 11:04
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tanhuajie! Sorry for the delay and thanks a lot for working on the model addition to transformers, great work, and already looks super clean!

I see the model is built on pretty standard modules, so it would be incredibly helpful if you could reuse library modules with inheritance using our new modular tool.

See other comments below!

@tanhuajie
Copy link
Contributor Author

Re the change, the idea is to use position_embeddings rather than rotary_pos_emb argument.

        if position_embeddings is None:
            logger.warning_once(
                "The attention layers in this model are transitioning from computing the RoPE embeddings internally "
                "through `rotary_pos_emb` (2D tensor of RoPE theta values), to using externally computed "
                "`position_embeddings` (Tuple of tensors, containing cos and sin). In v4.54 `rotary_pos_emb` will be "
                "removed and `position_embeddings` will be mandatory."
            )
            emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
            cos = emb.cos()
            sin = emb.sin()
        else:
            cos, sin = position_embeddings

So,

emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
position_embeddings = (emb.cos(), emb.sin())

Should happen not inside the attention, but inside the model itself and then have them propagated across layers

Okay, I'll make the revisions based on your suggestions later~

Hey @qubvel. I think it's done now. 🤗 Let me know if you have any feedback when you have a moment.

@qubvel
Copy link
Member

qubvel commented Apr 9, 2025

run-slow: mlcd

Copy link
Contributor

github-actions bot commented Apr 9, 2025

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mlcd']
quantizations: [] ...

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments 🤗

@qubvel
Copy link
Member

qubvel commented Apr 9, 2025

cc @ArthurZucker for review when you have bandwidth

@qubvel qubvel requested a review from ArthurZucker April 9, 2025 17:33
@anxiangsir
Copy link

Dear @ArthurZucker ,

I hope this message finds you well! I just wanted to kindly follow up and ask if you could help review this whenever you have the time. Please don’t hesitate to let me know if there’s anything I can provide to assist. I truly appreciate your time and support!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marvelous thanks all for the great PR! 🤗

Comment on lines 147 to 148
class MLCDConfig(MLCDVisionConfig):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird to me, we should not have two configs that are exactly the same

Copy link
Contributor Author

@tanhuajie tanhuajie Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ArthurZucker. The reason we provided a dummy MLCDConfig class is that when using the modular tool to create our custom MLCDEncoder by inheriting from CLIPEncoder, it automatically generates an unexpected MLCDConfig class. This generated configuration is identical to MLCDVisualConfig in all aspects except for the name. Therefore, we included the dummy MLCDConfig to handle this case.

If this approach isn't preferred, we can alternatively use the overwrite method to eliminate the dummy MLCDConfig and retain only MLCDVisualConfig. Here's how to implement it:

class MLCDEncoder(CLIPEncoder):
    """
    Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a
    [`MLCDEncoderLayer`].
    Args:
        config: MLCDVisualConfig
    """

    def __init__(self, config: MLCDVisualConfig):
        super().__init__(config)

This modification explicitly specifies MLCDVisualConfig as the expected configuration type. 🤗

self.num_key_value_groups = config.num_key_value_groups
self.is_causal = False

def forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Kudos 🤗

@qubvel
Copy link
Member

qubvel commented Apr 14, 2025

Hey, @tanhuajie let's resolve conflicts with main, and we are good to merge it!

@tanhuajie
Copy link
Contributor Author

Hey, @tanhuajie let's resolve conflicts with main, and we are good to merge it!

Great, I'll resolve the conflicts right away. 🤗

@tanhuajie
Copy link
Contributor Author

Hi @qubvel, I've resolved the conflicts, removed the dummy MLCDConfig, and refined _init_weights(). Then, I think MLCD should now be ready to merge. 🤗cc @anxiangsir @ArthurZucker

@qubvel
Copy link
Member

qubvel commented Apr 15, 2025

run-slow: mlcd

Copy link
Contributor

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mlcd']
quantizations: [] ...

@qubvel
Copy link
Member

qubvel commented Apr 15, 2025

Thanks a lot for iterating on the model addition! Gerat work 🤗
Merging 🚀

@qubvel qubvel merged commit 6f7ea1c into huggingface:main Apr 15, 2025
19 checks passed
def get_input_embeddings(self) -> nn.Module:
return self.vision_model.embeddings.patch_embedding

@add_start_docstrings_to_model_forward(MLCD_VISION_INPUTS_DOCSTRING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI complains:

ValueError: The deployment of the documentation will fail because of the following errors:
The docstring of MLCDVisionModel.forward comports the following issue(s) and needs fixing:
- The return block is empty.

Should @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class="MLCDVisionConfig") be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder. I'll take a look at how to resolve the issue you mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, thanks for the great work!! Its solved in #37527. Thanks @qubvel !! ❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a fix ❤️

cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
* Add MLCD model

* Update codes for auto-mapping

* Add test scripts for MLCD

* Update doc for MLCD model

* Fix import error

* Fix import error

* Fix CI error for attention_outputs

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix CI error for initialization

* Fix code style for CI

* Fix code style for CI

* Reformat codes and docs for CI test

* Reformat codes and docs for CI test

* Remove unused attributes for CI test

* Fix style for CI test

* List MLCD in flash_attn doc

* Fix: typos, modulars, refactors from suggestions

* Refactoring convert_mlcd_weights_to_hf.py from suggestions

* Fix: docs conflicts

* Fix error for CI test

* Fix style for CI test

* Add integration test for MLCD

* Refactoring by class inheritance

* Fix: refactor attention interface, adjust codes

* Fix: merging conflicts

* Fix: merging conflicts

* Fix: style for CI test

* Fix: style for CI test

* Fix: set test_resize_embeddings to be False

* Fix: initializer for CI test

* Fix: conflicts, CI test, warning and refactoring

* Fix: merging conflicts

* Refactor

* Update docs

* Fix mistakes

* Remove unused args and fix multi-gpu error

* Revert position_embeddings

* Solve conflicts

* Solve conflicts

* Remove dummy

* Update _init_weights

* Update _init_weights

* Update _init_weights for CI test
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Add MLCD model

* Update codes for auto-mapping

* Add test scripts for MLCD

* Update doc for MLCD model

* Fix import error

* Fix import error

* Fix CI error for attention_outputs

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix CI error for initialization

* Fix code style for CI

* Fix code style for CI

* Reformat codes and docs for CI test

* Reformat codes and docs for CI test

* Remove unused attributes for CI test

* Fix style for CI test

* List MLCD in flash_attn doc

* Fix: typos, modulars, refactors from suggestions

* Refactoring convert_mlcd_weights_to_hf.py from suggestions

* Fix: docs conflicts

* Fix error for CI test

* Fix style for CI test

* Add integration test for MLCD

* Refactoring by class inheritance

* Fix: refactor attention interface, adjust codes

* Fix: merging conflicts

* Fix: merging conflicts

* Fix: style for CI test

* Fix: style for CI test

* Fix: set test_resize_embeddings to be False

* Fix: initializer for CI test

* Fix: conflicts, CI test, warning and refactoring

* Fix: merging conflicts

* Refactor

* Update docs

* Fix mistakes

* Remove unused args and fix multi-gpu error

* Revert position_embeddings

* Solve conflicts

* Solve conflicts

* Remove dummy

* Update _init_weights

* Update _init_weights

* Update _init_weights for CI test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add MLCD Model
8 participants