Skip to content

[Model] DeepseekV2 Support #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

saurabhkoshatwar
Copy link
Contributor

@saurabhkoshatwar saurabhkoshatwar commented Dec 26, 2024

Summary

Resolves #129 Add monkeypatch to support deepseepV2 model.

Details

Ops patched:

  • rms_norm
  • swiglu
  • cross_entropy
  • fused_linear_cross_entropy

Testing Done

  • Hardware Type: NVIDIA A100-SXM4-40GB
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@saurabhkoshatwar saurabhkoshatwar marked this pull request as draft December 26, 2024 00:58
@saurabhkoshatwar saurabhkoshatwar marked this pull request as ready for review December 31, 2024 20:42
@saurabhkoshatwar
Copy link
Contributor Author

saurabhkoshatwar commented Jan 7, 2025

@ByronHsu @yundai424 @Tcc0403 @qingquansong
As discussed in the issue, the rope implementation is different in DeepSeek.

deepseek:

    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)

    b, h, s, d = q.shape
    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    b, h, s, d = k.shape
    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

llama:

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed`

I will create a separate PR to implement the DeepSeek rope.

import sys

# Ensure the model is a DeepSeek model
if "deepseek" not in model.__class__.__module__:
Copy link
Collaborator

@tyler-romero tyler-romero Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do deepseek and deepseek-v3 share the same architecture? If so, perhaps this function should be called apply_liger_kernel_to_deepseek, if not, perhaps we should strengthen this check.

if model_name[:6] == "remote":
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module

model = create_model(model_name).to(dtype).to(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to create the model before applying the patch?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)

model_class = MINI_MODEL_SETUPS[model_name].model_class
return model_class(model_config)
if model_name[:6] == "remote":
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)
Copy link
Collaborator

@tyler-romero tyler-romero Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is necessary? Its it because the model cannot be run without trust_remote_code? As is, this default opts-in anyone who runs these unit tests into running remote code on their machine, which is a red flag.

I think a preferable path would be to add deepseekv2 to the transformers library, then add it to Liger, so that trust_remote_code is not necessary.

This also has the benefit of making it easier to follow changes that are made to the underlying model, which is a common source of bugs in Liger.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like support for deepseekv2 is underway (maybe stalled though): huggingface/transformers#31976

model_class = MINI_MODEL_SETUPS[model_name].model_class
return model_class(model_config)
if model_name[:6] == "remote":
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ideally don't want tests to require internet access. Can we keep the remote code in tests/resources folder for the time being. This would also make us omit the trust_remote_code issues.

if model_name[:6] == "remote":
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module

model = create_model(model_name).to(dtype).to(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)

model_class = MINI_MODEL_SETUPS[model_name].model_class
return model_class(model_config)
if model_name[:6] == "remote":
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my previous comment

if model_name[:6] == "remote":
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module

model = create_model(model_name).to(dtype).to(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] support for DeepseekV2
6 participants