-
Notifications
You must be signed in to change notification settings - Fork 368
[Model] DeepseekV2 Support #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Model] DeepseekV2 Support #499
Conversation
Add deepseekv2 convergence test
…war/Liger-Kernel into feature/deepseekv2
@ByronHsu @yundai424 @Tcc0403 @qingquansong deepseek: cos = cos[position_ids].unsqueeze(unsqueeze_dim)
sin = sin[position_ids].unsqueeze(unsqueeze_dim)
b, h, s, d = q.shape
q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
b, h, s, d = k.shape
k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed llama: cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed` I will create a separate PR to implement the DeepSeek rope. |
import sys | ||
|
||
# Ensure the model is a DeepSeek model | ||
if "deepseek" not in model.__class__.__module__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do deepseek and deepseek-v3 share the same architecture? If so, perhaps this function should be called apply_liger_kernel_to_deepseek
, if not, perhaps we should strengthen this check.
if model_name[:6] == "remote": | ||
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module | ||
|
||
model = create_model(model_name).to(dtype).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change to create the model before applying the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)
model_class = MINI_MODEL_SETUPS[model_name].model_class | ||
return model_class(model_config) | ||
if model_name[:6] == "remote": | ||
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is necessary? Its it because the model cannot be run without trust_remote_code
? As is, this default opts-in anyone who runs these unit tests into running remote code on their machine, which is a red flag.
I think a preferable path would be to add deepseekv2 to the transformers library, then add it to Liger, so that trust_remote_code is not necessary.
This also has the benefit of making it easier to follow changes that are made to the underlying model, which is a common source of bugs in Liger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like support for deepseekv2 is underway (maybe stalled though): huggingface/transformers#31976
model_class = MINI_MODEL_SETUPS[model_name].model_class | ||
return model_class(model_config) | ||
if model_name[:6] == "remote": | ||
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ideally don't want tests to require internet access. Can we keep the remote code in tests/resources folder for the time being. This would also make us omit the trust_remote_code issues.
if model_name[:6] == "remote": | ||
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module | ||
|
||
model = create_model(model_name).to(dtype).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see a reason for the change as well unless you want to experiment that patching still works post model init :)
model_class = MINI_MODEL_SETUPS[model_name].model_class | ||
return model_class(model_config) | ||
if model_name[:6] == "remote": | ||
config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my previous comment
if model_name[:6] == "remote": | ||
revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module | ||
|
||
model = create_model(model_name).to(dtype).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous
Summary
Resolves #129 Add monkeypatch to support deepseepV2 model.
Details
Ops patched:
Testing Done
make test
to ensure correctnessmake checkstyle
to ensure code stylemake test-convergence
to ensure convergence