Add MLCD model #36182

tanhuajie · 2025-02-13T17:44:38Z

What does this PR do?

This PR adds MLCD model from DeepGlint-AI Team.

Fixes #36181

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts @qubvel @ArthurZucker

Quick Test

from transformers import AutoProcessor, MLCDVisionModel
from PIL import Image
import requests

# Load model and processor
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-336")
processor = AutoProcessor.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-336")

# Process single image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

# Get visual features
outputs = model(**inputs)
features = outputs.last_hidden_state

print(f"Extracted features shape: {features.shape}")

tanhuajie · 2025-02-14T06:17:58Z

Hi, Pavel @qubvel. Hope this message finds you well. It appears that our PR failed on the final step during CI. From the error messages, it looks like the issue stems from the Rt-Detr tests rather than problems within our code. Could you please guide us on how to disable or skip the tests for other models so we can successfully complete the CI process for our PR? Thanks!

anxiangsir · 2025-02-14T15:37:54Z

Hi Pavel and Arthur, @qubvel @Rocketknight1 :

Could you please take a moment to review the pull request? Your insights would be immensely appreciated and would greatly contribute to ensuring the quality of the changes. We're truly grateful for your help!

Thank you so much!

Rocketknight1 · 2025-02-25T17:13:51Z

Gentle ping @qubvel, but let me know if you want me to take any part of the review!

qubvel

Hey @tanhuajie! Sorry for the delay and thanks a lot for working on the model addition to transformers, great work, and already looks super clean!

I see the model is built on pretty standard modules, so it would be incredibly helpful if you could reuse library modules with inheritance using our new modular tool.

https://huggingface.co/docs/transformers/modular_transformers
see other models such as gemma/ijepa/siglip2 (modular_*.py file)

See other comments below!

src/transformers/models/mlcd/original_vit_rope2d.py

src/transformers/models/mlcd/modeling_mlcd.py

tests/models/mlcd/test_modeling_mlcd.py

src/transformers/models/mlcd/convert_mlcd_weights_to_hf.py

tanhuajie · 2025-04-09T14:26:30Z

Re the change, the idea is to use position_embeddings rather than rotary_pos_emb argument.

        if position_embeddings is None:
            logger.warning_once(
                "The attention layers in this model are transitioning from computing the RoPE embeddings internally "
                "through `rotary_pos_emb` (2D tensor of RoPE theta values), to using externally computed "
                "`position_embeddings` (Tuple of tensors, containing cos and sin). In v4.54 `rotary_pos_emb` will be "
                "removed and `position_embeddings` will be mandatory."
            )
            emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
            cos = emb.cos()
            sin = emb.sin()
        else:
            cos, sin = position_embeddings

So,

emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
position_embeddings = (emb.cos(), emb.sin())

Should happen not inside the attention, but inside the model itself and then have them propagated across layers

Okay, I'll make the revisions based on your suggestions later~

Hey @qubvel. I think it's done now. 🤗 Let me know if you have any feedback when you have a moment.

qubvel · 2025-04-09T17:24:06Z

run-slow: mlcd

github-actions · 2025-04-09T17:25:25Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mlcd']
quantizations: [] ...

qubvel

Thanks for addressing comments 🤗

qubvel · 2025-04-09T17:33:11Z

cc @ArthurZucker for review when you have bandwidth

anxiangsir · 2025-04-11T05:52:30Z

Dear @ArthurZucker ,

I hope this message finds you well! I just wanted to kindly follow up and ask if you could help review this whenever you have the time. Please don’t hesitate to let me know if there’s anything I can provide to assist. I truly appreciate your time and support!

ArthurZucker

Marvelous thanks all for the great PR! 🤗

ArthurZucker · 2025-04-11T15:14:59Z

src/transformers/models/mlcd/modular_mlcd.py

+class MLCDConfig(MLCDVisionConfig):
+    pass


this is weird to me, we should not have two configs that are exactly the same

Hi, @ArthurZucker. The reason we provided a dummy MLCDConfig class is that when using the modular tool to create our custom MLCDEncoder by inheriting from CLIPEncoder, it automatically generates an unexpected MLCDConfig class. This generated configuration is identical to MLCDVisualConfig in all aspects except for the name. Therefore, we included the dummy MLCDConfig to handle this case.

If this approach isn't preferred, we can alternatively use the overwrite method to eliminate the dummy MLCDConfig and retain only MLCDVisualConfig. Here's how to implement it:

class MLCDEncoder(CLIPEncoder): """ Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a [`MLCDEncoderLayer`]. Args: config: MLCDVisualConfig """ def __init__(self, config: MLCDVisualConfig): super().__init__(config)

This modification explicitly specifies MLCDVisualConfig as the expected configuration type. 🤗

ArthurZucker · 2025-04-11T15:15:57Z

src/transformers/models/mlcd/modular_mlcd.py

+        self.num_key_value_groups = config.num_key_value_groups
+        self.is_causal = False
+
+    def forward(


ArthurZucker · 2025-04-11T15:16:22Z

src/transformers/models/mlcd/modular_mlcd.py

Very nice! Kudos 🤗

qubvel · 2025-04-14T14:50:28Z

Hey, @tanhuajie let's resolve conflicts with main, and we are good to merge it!

tanhuajie · 2025-04-14T14:54:39Z

Hey, @tanhuajie let's resolve conflicts with main, and we are good to merge it!

Great, I'll resolve the conflicts right away. 🤗

tanhuajie · 2025-04-14T17:34:11Z

Hi @qubvel, I've resolved the conflicts, removed the dummy MLCDConfig, and refined _init_weights(). Then, I think MLCD should now be ready to merge. 🤗cc @anxiangsir @ArthurZucker

qubvel · 2025-04-15T08:33:57Z

run-slow: mlcd

github-actions · 2025-04-15T08:35:19Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mlcd']
quantizations: [] ...

qubvel · 2025-04-15T10:32:36Z

Thanks a lot for iterating on the model addition! Gerat work 🤗
Merging 🚀

manueldeprada · 2025-04-15T12:24:20Z

src/transformers/models/mlcd/modeling_mlcd.py

+    def get_input_embeddings(self) -> nn.Module:
+        return self.vision_model.embeddings.patch_embedding
+
+    @add_start_docstrings_to_model_forward(MLCD_VISION_INPUTS_DOCSTRING)


CI complains:

ValueError: The deployment of the documentation will fail because of the following errors: The docstring of MLCDVisionModel.forward comports the following issue(s) and needs fixing: - The return block is empty.

Should @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class="MLCDVisionConfig") be added?

Thank you for the reminder. I'll take a look at how to resolve the issue you mentioned.

no worries, thanks for the great work!! Its solved in #37527. Thanks @qubvel !! ❤️

Thanks for a fix ❤️

* Add MLCD model * Update codes for auto-mapping * Add test scripts for MLCD * Update doc for MLCD model * Fix import error * Fix import error * Fix CI error for attention_outputs * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix CI error for initialization * Fix code style for CI * Fix code style for CI * Reformat codes and docs for CI test * Reformat codes and docs for CI test * Remove unused attributes for CI test * Fix style for CI test * List MLCD in flash_attn doc * Fix: typos, modulars, refactors from suggestions * Refactoring convert_mlcd_weights_to_hf.py from suggestions * Fix: docs conflicts * Fix error for CI test * Fix style for CI test * Add integration test for MLCD * Refactoring by class inheritance * Fix: refactor attention interface, adjust codes * Fix: merging conflicts * Fix: merging conflicts * Fix: style for CI test * Fix: style for CI test * Fix: set test_resize_embeddings to be False * Fix: initializer for CI test * Fix: conflicts, CI test, warning and refactoring * Fix: merging conflicts * Refactor * Update docs * Fix mistakes * Remove unused args and fix multi-gpu error * Revert position_embeddings * Solve conflicts * Solve conflicts * Remove dummy * Update _init_weights * Update _init_weights * Update _init_weights for CI test

tanhuajie added 6 commits February 13, 2025 23:35

Add MLCD model

12c3fb6

Update codes for auto-mapping

a60415f

Add test scripts for MLCD

e93a5c3

Update doc for MLCD model

52a7446

Fix import error

e79d11b

Fix import error

1ae2d42

qubvel added New model Vision labels Feb 13, 2025

tanhuajie and others added 15 commits February 14, 2025 02:17

Fix CI error for attention_outputs

e4237c1

Fix code style for CI

eb16654

Fix code style for CI

33fb87e

Fix code style for CI

d8407ea

Fix code style for CI

0b68637

Fix code style for CI

e9bce0f

Fix CI error for initialization

fc8fa39

Fix code style for CI

74efc12

Fix code style for CI

4acd050

Reformat codes and docs for CI test

91bb2be

Reformat codes and docs for CI test

7bdd5e8

Remove unused attributes for CI test

a96a855

Fix style for CI test

e17d7a7

List MLCD in flash_attn doc

88ae2ae

Merge branch 'main' into add_mlcd

6c6c807

Merge branch 'main' into add_mlcd

2788b1f

yiyexy approved these changes Feb 18, 2025

View reviewed changes

qubvel self-requested a review February 26, 2025 11:04

qubvel reviewed Feb 26, 2025

View reviewed changes

tanhuajie and others added 3 commits April 9, 2025 21:24

Merge branch 'main' into add_mlcd

1e5e17c

Revert position_embeddings

8114d6c

Merge branch 'main' into add_mlcd

b01a829

qubvel approved these changes Apr 9, 2025

View reviewed changes

qubvel requested a review from ArthurZucker April 9, 2025 17:33

ArthurZucker approved these changes Apr 11, 2025

View reviewed changes

tanhuajie and others added 8 commits April 14, 2025 23:15

Solve conflicts

43917bd

Merge branch 'main' into add_mlcd

8fa6937

Solve conflicts

74bf4d5

Remove dummy

ed77186

Update _init_weights

c21db77

Merge branch 'main' into add_mlcd

34ae21e

Update _init_weights

e9a8882

Update _init_weights for CI test

87f4a16

qubvel merged commit 6f7ea1c into huggingface:main Apr 15, 2025
19 checks passed

manueldeprada reviewed Apr 15, 2025

View reviewed changes

molbap mentioned this pull request Apr 16, 2025

🔴 Update CLIP vision attention to new attention interface #37498

Merged

Add MLCD model #36182

Add MLCD model #36182

Conversation

tanhuajie commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Quick Test

Uh oh!

tanhuajie commented Feb 14, 2025

Uh oh!

anxiangsir commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Feb 25, 2025

Uh oh!

qubvel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tanhuajie commented Apr 9, 2025

Uh oh!

qubvel commented Apr 9, 2025

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

qubvel commented Apr 9, 2025

Uh oh!

anxiangsir commented Apr 11, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

tanhuajie Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

qubvel commented Apr 14, 2025

Uh oh!

tanhuajie commented Apr 14, 2025

Uh oh!

tanhuajie commented Apr 14, 2025

Uh oh!

qubvel commented Apr 15, 2025

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

qubvel commented Apr 15, 2025

Uh oh!

Uh oh!

manueldeprada Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

tanhuajie Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Apr 15, 2025

tanhuajie commented Feb 13, 2025 •

edited

Loading

anxiangsir commented Feb 14, 2025 •

edited

Loading

qubvel left a comment •

edited

Loading

tanhuajie Apr 14, 2025 •

edited

Loading