Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Infer】MLA matrix absorption separation #10249

Merged
merged 2 commits into from
Mar 26, 2025

Conversation

ckl117
Copy link
Contributor

@ckl117 ckl117 commented Mar 21, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Performance optimization

PR changes

Others

Description

DeepSeeK MLA矩阵吸收分离,降低显存占用,提高极限吞吐性能。

Copy link

paddle-bot bot commented Mar 21, 2025

Thanks for your contribution!

@ckl117 ckl117 changed the title 【Infer】bf16 batch gemm 【Infer】MLA matrix absorption separation Mar 21, 2025
Copy link

codecov bot commented Mar 21, 2025

Codecov Report

Attention: Patch coverage is 0% with 70 lines in your changes missing coverage. Please review.

Project coverage is 49.96%. Comparing base (d1e156a) to head (63f3e2f).
Report is 11 commits behind head on develop.

Files with missing lines Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 56 Missing ⚠️
.../experimental/transformers/deepseek_v2/modeling.py 0.00% 14 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (49.96%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10249      +/-   ##
===========================================
+ Coverage    49.70%   49.96%   +0.25%     
===========================================
  Files          761      761              
  Lines       124218   124105     -113     
===========================================
+ Hits         61744    62009     +265     
+ Misses       62474    62096     -378     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuanlehome
Copy link
Collaborator

辛苦同步修改一下,bf16/wint8的组网

@yuanlehome yuanlehome self-requested a review March 25, 2025 06:07
@ckl117 ckl117 force-pushed the develop_absorption_batch_gemm_bf16 branch from 80fea35 to 63f3e2f Compare March 25, 2025 12:00
Copy link
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit a0c08ba into PaddlePaddle:develop Mar 26, 2025
10 of 12 checks passed
ckl117 added a commit to ckl117/PaddleNLP that referenced this pull request Mar 27, 2025
* bf16 batch gemm

* bf16 and wint8 matrix_absorption
yuanlehome added a commit that referenced this pull request Mar 27, 2025
* 【Infer】MLA matrix absorption separation (#10249)

* bf16 batch gemm

* bf16 and wint8 matrix_absorption

* [MLA] move compute_out_linear out and fix bug when q_lora_rank is None (#10275)

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants