Skip to content

sgemm中bank conflict计算问题 #399

@chefwang-cloid

Description

@chefwang-cloid

您好,我最近刚入门cuda,想请教下sgemm.cu文件里sgemm_t_8x8_sliced_k_f32x4_bcf_kernel对s_a,s_b这两个共享内存的bank conflict是如何计算的。

s_b store load
向s_b store数据时每个线程会通过FLOAT4 一次 store 4个float数据,大小是16Bytes,而32个bank的宽度是128Bytes,32个线程会分成4次进行写入,每次使用8个线程,我理解这8个线程访问的bank地址是没有冲突的吧。
2. bank layout analysis: s_b[8][128] same as s_a[8][128]
3. bank conficts analysis: s_b[8][128]
tid 0 -> k 0, n 0 -> all access bank 0-3 (layer_0)
tid 1 -> k 0, n 4 -> all access bank 4-7 (layer_0)
tid 2 -> k 0, n 8 -> all access bank 7-11 (layer_0)
tid 7 -> k 0, n 28 -> all access bank 28-31 (layer_0)
tid 8 -> k 0, n 32 -> all access bank 0-3 (layer_1)
... ... ... ...
tid 15 -> k 0, n 60 -> all access bank 28-31 (layer_1)
tid 16 -> k 0, n 64 -> all access bank 0-3 (layer_2)
... ... ... ...
tid 31 -> k 0, n 124 -> all access bank 28-31 (layer_3)
conclusion: we still have bank conflicts within warp,
0/8/16/24 -> bank 0-3, 1/9/17/25 -> bank 4-7, etc.
thus, we still need 4 memory issues at least per warp.

此时8个线程访问bank的情况如下所示:
tid 0 -> k 0, n 0 -> all access bank 0-3 (layer_0)
tid 1 -> k 0, n 4 -> all access bank 4-7 (layer_0)
tid 2 -> k 0, n 8 -> all access bank 7-11 (layer_0)
tid 7 -> k 0, n 28 -> all access bank 28-31 (layer_0)

同理从s_b load数据的时候,和写入数据时逻辑是一样的,一个线程一次只会读取4个float数据,128Bytes需要8个线程完成读取操作,因此应该也是没有bank conflict的。

s_a store load
向s_a store数据时产生2路冲突我大概理解。
从s_a load数据的时候,由于一个线程通过FLOAT4 一次 load 4个float数据,32个bank宽度总共使用8个线程进行处理,t0-t15这16个线程访问的bank都是0-3,那是会产生8路冲突,还是通过广播的方式就没有bank conflict了?
bank conflicts analysis, tx/ty 0-15, 0-7 bank 4*8=32 bytes
tid 0-15 access bank 0-3, tid 16-31 access bank 4-7, etc.
tid 0, tk 0 -> ty 0 -> [0][0+0-3],[0][64+0-3] -> bank 0-3(layer_0/2),
tid 0, tk 7 -> ty 0 -> [7][0+0-3],[0][64+0-3] -> bank
0-3(layer_28/30), tid 15, tk 0 -> ty 0 -> [0][0+0-3],[0][64+0-3] ->
bank 0-3(layer_0/2), tid 15, tk 7 -> ty 0 -> [7][0+0-3],[0][64+0-3] ->
bank 0-3(layer_28/30), tid 16, tk 0 -> ty 1 -> [0][0+4-7],[0][64+4-7]
-> bank 4-7(layer_0/2), tid 16, tk 7 -> ty 1 -> [7][0+4-7],[0][64+4-7]
-> bank 4-7(layer_28/30), tid 31, tk 0 -> ty 1 ->
[0][0+4-7],[0][64+4-7] -> bank 4-7(layer_0/2), tid 31, tk 7 -> ty 1 ->
[7][0+4-7],[0][64+4-7] -> bank 4-7(layer_28/30), tid 255,tk 0 -> ty 15
-> [0][0+60-63],[0][64+60-63] -> bank 28-31(layer_1/3), tid 255,tk 7 ->
ty 15 -> [7][0+60-63],[0][64+60-63] -> bank 28-31(layer_29/31),

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions