-
-
Notifications
You must be signed in to change notification settings - Fork 843
Open
Description
在online_safe_softmax_f32_per_token_kernel这个kernel的实现中,每个warp首先归约得到m和d并写入共享内存中,再由block内部分线程进行第二轮warp reduce来得到整个线程块规约的m和d。我的疑问是:
LeetCUDA/kernels/softmax/softmax.cu
Line 350 in 228342c
| if (local_tid < WARP_SIZE) { |
local_tid < WARP_NUM, 并且对不满足该条件的线程用MD的identity赋值,否则这里的判断有越界访问shared数组的风险?Metadata
Metadata
Assignees
Labels
No labels