[Model] DeepseekV2 Support #499

saurabhkoshatwar · 2024-12-26T00:58:13Z

Summary

Resolves #129 Add monkeypatch to support deepseepV2 model.

Details

Ops patched:

rms_norm
swiglu
cross_entropy
fused_linear_cross_entropy

Testing Done

Hardware Type: NVIDIA A100-SXM4-40GB
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Add deepseekv2 convergence test

…war/Liger-Kernel into feature/deepseekv2

saurabhkoshatwar · 2025-01-07T01:40:16Z

@ByronHsu @yundai424 @Tcc0403 @qingquansong
As discussed in the issue, the rope implementation is different in DeepSeek.

deepseek:

    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)

    b, h, s, d = q.shape
    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    b, h, s, d = k.shape
    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

llama:

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed`

I will create a separate PR to implement the DeepSeek rope.

src/liger_kernel/transformers/monkey_patch.py

tyler-romero · 2025-01-15T17:25:08Z

src/liger_kernel/transformers/monkey_patch.py

+    import sys
+
+    # Ensure the model is a DeepSeek model
+    if "deepseek" not in model.__class__.__module__:


Do deepseek and deepseek-v3 share the same architecture? If so, perhaps this function should be called apply_liger_kernel_to_deepseek, if not, perhaps we should strengthen this check.

src/liger_kernel/transformers/monkey_patch.py

tyler-romero · 2025-01-15T17:29:55Z

test/convergence/test_mini_models.py

+    if model_name[:6] == "remote":
+        revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module
+
+    model = create_model(model_name).to(dtype).to(device)


Why the change to create the model before applying the patch?

Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)

tyler-romero · 2025-01-15T17:36:17Z

test/convergence/test_mini_models.py

-    model_class = MINI_MODEL_SETUPS[model_name].model_class
-    return model_class(model_config)
+    if model_name[:6] == "remote":
+        config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)


Can you explain why this is necessary? Its it because the model cannot be run without trust_remote_code? As is, this default opts-in anyone who runs these unit tests into running remote code on their machine, which is a red flag.

I think a preferable path would be to add deepseekv2 to the transformers library, then add it to Liger, so that trust_remote_code is not necessary.

This also has the benefit of making it easier to follow changes that are made to the underlying model, which is a common source of bugs in Liger.

It looks like support for deepseekv2 is underway (maybe stalled though): huggingface/transformers#31976

src/liger_kernel/transformers/model/deepseekv2.py

shivam15s · 2025-03-17T23:44:04Z

test/convergence/test_mini_models.py

-    model_class = MINI_MODEL_SETUPS[model_name].model_class
-    return model_class(model_config)
+    if model_name[:6] == "remote":
+        config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)


We ideally don't want tests to require internet access. Can we keep the remote code in tests/resources folder for the time being. This would also make us omit the trust_remote_code issues.

shivam15s · 2025-03-17T23:48:37Z

test/convergence/test_mini_models.py

+    if model_name[:6] == "remote":
+        revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module
+
+    model = create_model(model_name).to(dtype).to(device)


Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)

shivam15s · 2025-03-17T23:48:53Z

test/convergence/test_mini_models_with_logits.py

-    model_class = MINI_MODEL_SETUPS[model_name].model_class
-    return model_class(model_config)
+    if model_name[:6] == "remote":
+        config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)


see my previous comment

shivam15s · 2025-03-17T23:49:05Z

test/convergence/test_mini_models_with_logits.py

+    if model_name[:6] == "remote":
+        revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module
+
+    model = create_model(model_name).to(dtype).to(device)


see previous

initial patch code and test

c62736a

saurabhkoshatwar marked this pull request as draft December 26, 2024 00:58

saurabhkoshatwar and others added 5 commits December 31, 2024 11:26

Add deepseekv2 convergence test

f58471d

Add deepseekv2 convergence test

test fix

e28dc49

Merge branch 'feature/deepseekv2' of https://github.com/saurabhkoshat…

1a7efbf

…war/Liger-Kernel into feature/deepseekv2

Add test without logits

adfc644

checkstyle fixes

f1310e1

saurabhkoshatwar marked this pull request as ready for review December 31, 2024 20:42

saurabhkoshatwar and others added 2 commits December 31, 2024 12:43

Merge branch 'main' into feature/deepseekv2

a76931a

fused lce fix

0a17f0b

Merge branch 'main' into feature/deepseekv2

b6287f1

tyler-romero reviewed Jan 15, 2025

View reviewed changes

src/liger_kernel/transformers/monkey_patch.py Show resolved Hide resolved

tyler-romero reviewed Jan 15, 2025

View reviewed changes

src/liger_kernel/transformers/monkey_patch.py Show resolved Hide resolved

tyler-romero reviewed Jan 15, 2025

View reviewed changes

yundai424 reviewed Jan 16, 2025

View reviewed changes

src/liger_kernel/transformers/model/deepseekv2.py Outdated Show resolved Hide resolved

lancerts added 2 commits January 21, 2025 15:39

Merge branch 'main' into feature/deepseekv2

8e71b13

Merge branch 'main' into feature/deepseekv2

2b5e749

saurabhkoshatwar requested a review from yundai424 January 22, 2025 19:26

add docstring source link

a34774d

shivam15s requested changes Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] DeepseekV2 Support #499

[Model] DeepseekV2 Support #499

Uh oh!

saurabhkoshatwar commented Dec 26, 2024 •

edited

Loading

Uh oh!

saurabhkoshatwar commented Jan 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

tyler-romero Jan 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

tyler-romero Jan 15, 2025

Uh oh!

shivam15s Mar 17, 2025

Uh oh!

tyler-romero Jan 15, 2025 •

edited

Loading

Uh oh!

tyler-romero Jan 15, 2025

Uh oh!

Uh oh!

shivam15s Mar 17, 2025

Uh oh!

shivam15s Mar 17, 2025

Uh oh!

shivam15s Mar 17, 2025

Uh oh!

shivam15s Mar 17, 2025

Uh oh!

Uh oh!

[Model] DeepseekV2 Support #499

Are you sure you want to change the base?

[Model] DeepseekV2 Support #499

Uh oh!

Conversation

saurabhkoshatwar commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Testing Done

Uh oh!

saurabhkoshatwar commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tyler-romero Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tyler-romero Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

shivam15s Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

tyler-romero Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tyler-romero Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shivam15s Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

shivam15s Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

shivam15s Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

shivam15s Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saurabhkoshatwar commented Dec 26, 2024 •

edited

Loading

saurabhkoshatwar commented Jan 7, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading