Skip to content

[slice]support different shape case for GPUScatterAdd op #73971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 15, 2025

Conversation

zhanghonggeng
Copy link
Contributor

@zhanghonggeng zhanghonggeng commented Jul 10, 2025

PR Category

Performance Optimization

PR Types

Improvements

Description

问题背景:输入:(108, 64, 12288), axis:0, index:input_shape[axis]为例,gather反向相比torch慢60%,因此考虑优化gather_gard。
实现GPUScatterAdd kernel替换GPUScatterAssign。GPUScatterAdd kernel支持stride,通过stride计算将kernel内索引计算转换为首地址+偏移量,简化了kernel内复杂的索引计算,上述case中有60%性能提升。

对应slice case中输入:Tensor([108,64,12288],"float32"), index:Tensor([2,4,6],"int64") 。

  1. getitem中index_size为1时选择gather+reshape kernel作为快速通道,fp32前向gpu score:0.97 -> 0.68, 反向gpu score:
    2.73 -> 1.21,
  2. gather反向中GPUScatterAdd kernel支持index.numel() != x.dims()[axis_v]的场景。

pcard-67164

Copy link

paddle-bot bot commented Jul 10, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhanghonggeng zhanghonggeng changed the title [slice]list_tensor_gather test [slice]support different shape case for GPUScatterAdd op Jul 14, 2025
@zhanghonggeng
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@zhanghonggeng
Copy link
Contributor Author

/re-run all-failed

@xiaoguoguo626807 xiaoguoguo626807 merged commit 77166d2 into PaddlePaddle:develop Jul 15, 2025
83 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants