Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] add tests for loop definition of bitconvert #133052

Merged
merged 15 commits into from
Apr 8, 2025

Conversation

Shoreshen
Copy link
Contributor

@Shoreshen Shoreshen commented Mar 26, 2025

This PR add test cases for all types of bit conversation, it prepares for PR: #132899

All tests passed due to:

  1. For DAG, pattern will not separate SReg and VReg. One of the sample is:
    define <2 x double> @v_bitcast_v4f32_to_v2f64(<4 x float> inreg %a, i32 %b) {
     %cmp = icmp eq i32 %b, 0
     br i1 %cmp, label %cmp.true, label %cmp.false
    
    cmp.true:
     %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
     %a2 = bitcast <4 x float> %a1 to <2 x double>
     br label %end
    
    cmp.false:
     %a3 = bitcast <4 x float> %a to <2 x double>
     br label %end
    
    end:
     %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
     ret <2 x double> %phi
    }
    
    It suppose to select from scalar register patterns. But the Vreg pattern is matched is as follow:
    Debug log:
    ISEL: Starting selection on root node: t3: v2f64 = bitcast t2
    ISEL: Starting pattern match
     Initial Opcode index to 440336
     Skipped scope entry (due to false predicate) at index 440339, continuing at 440367
     Skipped scope entry (due to false predicate) at index 440368, continuing at 440396
     Skipped scope entry (due to false predicate) at index 440397, continuing at 440435
     Skipped scope entry (due to false predicate) at index 440436, continuing at 440467
     Skipped scope entry (due to false predicate) at index 440468, continuing at 440499
     Skipped scope entry (due to false predicate) at index 440500, continuing at 440552
     Skipped scope entry (due to false predicate) at index 440553, continuing at 440587
     Skipped scope entry (due to false predicate) at index 440588, continuing at 440622
     Skipped scope entry (due to false predicate) at index 440623, continuing at 440657
     Skipped scope entry (due to false predicate) at index 440658, continuing at 440692
     Skipped scope entry (due to false predicate) at index 440693, continuing at 440727
     Skipped scope entry (due to false predicate) at index 440728, continuing at 440769
     Skipped scope entry (due to false predicate) at index 440770, continuing at 440798
     Skipped scope entry (due to false predicate) at index 440799, continuing at 440836
     Skipped scope entry (due to false predicate) at index 440837, continuing at 440870
     TypeSwitch[v2f64] from 440873 to 440892
    
    Patterns:
    /*440892*/    OPC_CompleteMatch, 1, 0, 
                  // Src: (bitconvert:{ *:[v2f64] } VReg_128:{ *:[v4f32] }:$src0) - Complexity = 3
                  // Dst: VReg_128:{ *:[v2f64] }:$src0
    
  2. Global isel will use Select_COPY to select bitcast

@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: None (Shoreshen)

Changes

All tests passed due to:

  1. For DAG, pattern will not separate SReg and VReg. One of the sample is:

define <2 x double> @v_bitcast_v4f32_to_v2f64(<4 x float> inreg %a, i32 %b) {
%cmp = icmp eq i32 %b, 0
br i1 %cmp, label %cmp.true, label %cmp.false

cmp.true:
%a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
%a2 = bitcast <4 x float> %a1 to <2 x double>
br label %end

cmp.false:
%a3 = bitcast <4 x float> %a to <2 x double>
br label %end

end:
%phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
ret <2 x double> %phi
}

It suppose to select from scalar register patterns. But the Vreg pattern is matched is as follow:
 ```
Debug log:
ISEL: Starting selection on root node: t3: v2f64 = bitcast t2
ISEL: Starting pattern match
Initial Opcode index to 440336
Skipped scope entry (due to false predicate) at index 440339, continuing at 440367
Skipped scope entry (due to false predicate) at index 440368, continuing at 440396
Skipped scope entry (due to false predicate) at index 440397, continuing at 440435
Skipped scope entry (due to false predicate) at index 440436, continuing at 440467
Skipped scope entry (due to false predicate) at index 440468, continuing at 440499
Skipped scope entry (due to false predicate) at index 440500, continuing at 440552
Skipped scope entry (due to false predicate) at index 440553, continuing at 440587
Skipped scope entry (due to false predicate) at index 440588, continuing at 440622
Skipped scope entry (due to false predicate) at index 440623, continuing at 440657
Skipped scope entry (due to false predicate) at index 440658, continuing at 440692
Skipped scope entry (due to false predicate) at index 440693, continuing at 440727
Skipped scope entry (due to false predicate) at index 440728, continuing at 440769
Skipped scope entry (due to false predicate) at index 440770, continuing at 440798
Skipped scope entry (due to false predicate) at index 440799, continuing at 440836
Skipped scope entry (due to false predicate) at index 440837, continuing at 440870
TypeSwitch[v2f64] from 440873 to 440892

Patterns:
/*440892*/    OPC_CompleteMatch, 1, 0, 
            // Src: (bitconvert:{ *:[v2f64] } VReg_128:{ *:[v4f32] }:$src0) - Complexity = 3
            // Dst: VReg_128:{ *:[v2f64] }:$src0
 ```
2. Global isel will use `Select_COPY` to select bitcast

---

Patch is 1.78 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133052.diff


16 Files Affected:

- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll (+2394) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll (+6084) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll (+178) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll (+556) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll (+1062) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll (+194) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll (+9118) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll (+209) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll (+220) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll (+1960) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll (+228) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll (+235) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll (+15566) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll (+4574) 
- (added) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll (+163) 
- (modified) llvm/test/lit.cfg.py (+1-1) 


``````````diff
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
new file mode 100644
index 0000000000000..9134339cd1665
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
@@ -0,0 +1,2394 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <32 x float> @v_bitcast_v32i32_to_v32f32(<32 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i32_to_v32f32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT:    s_waitcnt vmcnt(1)
+; GCN-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT:    s_cbranch_execz .LBB0_2
+; GCN-NEXT:  ; %bb.1: ; %cmp.true
+; GCN-NEXT:    s_waitcnt vmcnt(0)
+; GCN-NEXT:    v_add_i32_e32 v31, vcc, 3, v31
+; GCN-NEXT:    v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT:    v_add_i32_e32 v29, vcc, 3, v29
+; GCN-NEXT:    v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT:    v_add_i32_e32 v27, vcc, 3, v27
+; GCN-NEXT:    v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT:    v_add_i32_e32 v25, vcc, 3, v25
+; GCN-NEXT:    v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT:    v_add_i32_e32 v23, vcc, 3, v23
+; GCN-NEXT:    v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT:    v_add_i32_e32 v21, vcc, 3, v21
+; GCN-NEXT:    v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT:    v_add_i32_e32 v19, vcc, 3, v19
+; GCN-NEXT:    v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT:    v_add_i32_e32 v17, vcc, 3, v17
+; GCN-NEXT:    v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT:    v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT:    v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT:    v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT:    v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT:    v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT:    v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT:    v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT:    v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT:    v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT:    v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT:    v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT:    v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT:    v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT:    v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT:    v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT:    v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT:  .LBB0_2: ; %end
+; GCN-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT:    s_waitcnt vmcnt(0)
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i32_to_v32f32:
+; VI:       ; %bb.0:
+; VI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT:    buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT:    s_cbranch_execz .LBB0_2
+; VI-NEXT:  ; %bb.1: ; %cmp.true
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    v_add_u32_e32 v31, vcc, 3, v31
+; VI-NEXT:    v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT:    v_add_u32_e32 v29, vcc, 3, v29
+; VI-NEXT:    v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT:    v_add_u32_e32 v27, vcc, 3, v27
+; VI-NEXT:    v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT:    v_add_u32_e32 v25, vcc, 3, v25
+; VI-NEXT:    v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT:    v_add_u32_e32 v23, vcc, 3, v23
+; VI-NEXT:    v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT:    v_add_u32_e32 v21, vcc, 3, v21
+; VI-NEXT:    v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT:    v_add_u32_e32 v19, vcc, 3, v19
+; VI-NEXT:    v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT:    v_add_u32_e32 v17, vcc, 3, v17
+; VI-NEXT:    v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT:    v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT:    v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT:    v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT:    v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT:    v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT:    v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT:    v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT:    v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT:    v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT:    v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT:    v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT:    v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT:    v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT:    v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT:    v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT:    v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT:  .LBB0_2: ; %end
+; VI-NEXT:    s_or_b64 exec, exec, s[4:5]
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i32_to_v32f32:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT:    s_cbranch_execz .LBB0_2
+; GFX9-NEXT:  ; %bb.1: ; %cmp.true
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    v_add_u32_e32 v31, 3, v31
+; GFX9-NEXT:    v_add_u32_e32 v30, 3, v30
+; GFX9-NEXT:    v_add_u32_e32 v29, 3, v29
+; GFX9-NEXT:    v_add_u32_e32 v28, 3, v28
+; GFX9-NEXT:    v_add_u32_e32 v27, 3, v27
+; GFX9-NEXT:    v_add_u32_e32 v26, 3, v26
+; GFX9-NEXT:    v_add_u32_e32 v25, 3, v25
+; GFX9-NEXT:    v_add_u32_e32 v24, 3, v24
+; GFX9-NEXT:    v_add_u32_e32 v23, 3, v23
+; GFX9-NEXT:    v_add_u32_e32 v22, 3, v22
+; GFX9-NEXT:    v_add_u32_e32 v21, 3, v21
+; GFX9-NEXT:    v_add_u32_e32 v20, 3, v20
+; GFX9-NEXT:    v_add_u32_e32 v19, 3, v19
+; GFX9-NEXT:    v_add_u32_e32 v18, 3, v18
+; GFX9-NEXT:    v_add_u32_e32 v17, 3, v17
+; GFX9-NEXT:    v_add_u32_e32 v16, 3, v16
+; GFX9-NEXT:    v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT:    v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT:    v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT:    v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT:    v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT:    v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT:    v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT:    v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT:    v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT:    v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT:    v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT:    v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT:    v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT:    v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT:    v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT:    v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT:  .LBB0_2: ; %end
+; GFX9-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i32_to_v32f32:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_clause 0x1
+; GFX11-NEXT:    scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT:    scratch_load_b32 v31, off, s32
+; GFX11-NEXT:    s_mov_b32 s0, exec_lo
+; GFX11-NEXT:    s_waitcnt vmcnt(1)
+; GFX11-NEXT:    v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT:    s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT:    s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT:    s_cbranch_execz .LBB0_2
+; GFX11-NEXT:  ; %bb.1: ; %cmp.true
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
+; GFX11-NEXT:    v_add_nc_u32_e32 v31, 3, v31
+; GFX11-NEXT:    v_add_nc_u32_e32 v30, 3, v30
+; GFX11-NEXT:    v_add_nc_u32_e32 v29, 3, v29
+; GFX11-NEXT:    v_add_nc_u32_e32 v28, 3, v28
+; GFX11-NEXT:    v_add_nc_u32_e32 v27, 3, v27
+; GFX11-NEXT:    v_add_nc_u32_e32 v26, 3, v26
+; GFX11-NEXT:    v_add_nc_u32_e32 v25, 3, v25
+; GFX11-NEXT:    v_add_nc_u32_e32 v24, 3, v24
+; GFX11-NEXT:    v_add_nc_u32_e32 v23, 3, v23
+; GFX11-NEXT:    v_add_nc_u32_e32 v22, 3, v22
+; GFX11-NEXT:    v_add_nc_u32_e32 v21, 3, v21
+; GFX11-NEXT:    v_add_nc_u32_e32 v20, 3, v20
+; GFX11-NEXT:    v_add_nc_u32_e32 v19, 3, v19
+; GFX11-NEXT:    v_add_nc_u32_e32 v18, 3, v18
+; GFX11-NEXT:    v_add_nc_u32_e32 v17, 3, v17
+; GFX11-NEXT:    v_add_nc_u32_e32 v16, 3, v16
+; GFX11-NEXT:    v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT:    v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT:    v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT:    v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT:    v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT:    v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT:    v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT:    v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT:    v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT:    v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT:    v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT:    v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT:    v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT:    v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT:    v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT:    v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT:  .LBB0_2: ; %end
+; GFX11-NEXT:    s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %cmp = icmp eq i32 %b, 0
+  br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+  %a1 = add <32 x i32> %a, splat (i32 3)
+  %a2 = bitcast <32 x i32> %a1 to <32 x float>
+  br label %end
+
+cmp.false:
+  %a3 = bitcast <32 x i32> %a to <32 x float>
+  br label %end
+
+end:
+  %phi = phi <32 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+  ret <32 x float> %phi
+}
+
+define <32 x i32> @v_bitcast_v32f32_to_v32i32(<32 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f32_to_v32i32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT:    s_waitcnt vmcnt(1)
+; GCN-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT:    s_cbranch_execz .LBB1_2
+; GCN-NEXT:  ; %bb.1: ; %cmp.true
+; GCN-NEXT:    s_waitcnt vmcnt(0)
+; GCN-NEXT:    v_add_f32_e32 v31, 1.0, v31
+; GCN-NEXT:    v_add_f32_e32 v30, 1.0, v30
+; GCN-NEXT:    v_add_f32_e32 v29, 1.0, v29
+; GCN-NEXT:    v_add_f32_e32 v28, 1.0, v28
+; GCN-NEXT:    v_add_f32_e32 v27, 1.0, v27
+; GCN-NEXT:    v_add_f32_e32 v26, 1.0, v26
+; GCN-NEXT:    v_add_f32_e32 v25, 1.0, v25
+; GCN-NEXT:    v_add_f32_e32 v24, 1.0, v24
+; GCN-NEXT:    v_add_f32_e32 v23, 1.0, v23
+; GCN-NEXT:    v_add_f32_e32 v22, 1.0, v22
+; GCN-NEXT:    v_add_f32_e32 v21, 1.0, v21
+; GCN-NEXT:    v_add_f32_e32 v20, 1.0, v20
+; GCN-NEXT:    v_add_f32_e32 v19, 1.0, v19
+; GCN-NEXT:    v_add_f32_e32 v18, 1.0, v18
+; GCN-NEXT:    v_add_f32_e32 v17, 1.0, v17
+; GCN-NEXT:    v_add_f32_e32 v16, 1.0, v16
+; GCN-NEXT:    v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT:    v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT:    v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT:    v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT:    v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT:    v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT:    v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT:    v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT:    v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT:    v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT:    v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT:    v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT:    v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT:    v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT:    v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT:    v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT:  .LBB1_2: ; %end
+; GCN-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT:    s_waitcnt vmcnt(0)
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f32_to_v32i32:
+; VI:       ; %bb.0:
+; VI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT:    buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT:    s_cbranch_execz .LBB1_2
+; VI-NEXT:  ; %bb.1: ; %cmp.true
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    v_add_f32_e32 v31, 1.0, v31
+; VI-NEXT:    v_add_f32_e32 v30, 1.0, v30
+; VI-NEXT:    v_add_f32_e32 v29, 1.0, v29
+; VI-NEXT:    v_add_f32_e32 v28, 1.0, v28
+; VI-NEXT:    v_add_f32_e32 v27, 1.0, v27
+; VI-NEXT:    v_add_f32_e32 v26, 1.0, v26
+; VI-NEXT:    v_add_f32_e32 v25, 1.0, v25
+; VI-NEXT:    v_add_f32_e32 v24, 1.0, v24
+; VI-NEXT:    v_add_f32_e32 v23, 1.0, v23
+; VI-NEXT:    v_add_f32_e32 v22, 1.0, v22
+; VI-NEXT:    v_add_f32_e32 v21, 1.0, v21
+; VI-NEXT:    v_add_f32_e32 v20, 1.0, v20
+; VI-NEXT:    v_add_f32_e32 v19, 1.0, v19
+; VI-NEXT:    v_add_f32_e32 v18, 1.0, v18
+; VI-NEXT:    v_add_f32_e32 v17, 1.0, v17
+; VI-NEXT:    v_add_f32_e32 v16, 1.0, v16
+; VI-NEXT:    v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT:    v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT:    v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT:    v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT:    v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT:    v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT:    v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT:    v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT:    v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT:    v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT:    v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT:    v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT:    v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT:    v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT:    v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT:    v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT:  .LBB1_2: ; %end
+; VI-NEXT:    s_or_b64 exec, exec, s[4:5]
+; VI-NEXT:    s_waitcnt vmcnt(0)
+; VI-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f32_to_v32i32:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT:    s_cbranch_execz .LBB1_2
+; GFX9-NEXT:  ; %bb.1: ; %cmp.true
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    v_add_f32_e32 v31, 1.0, v31
+; GFX9-NEXT:    v_add_f32_e32 v30, 1.0, v30
+; GFX9-NEXT:    v_add_f32_e32 v29, 1.0, v29
+; GFX9-NEXT:    v_add_f32_e32 v28, 1.0, v28
+; GFX9-NEXT:    v_add_f32_e32 v27, 1.0, v27
+; GFX9-NEXT:    v_add_f32_e32 v26, 1.0, v26
+; GFX9-NEXT:    v_add_f32_e32 v25, 1.0, v25
+; GFX9-NEXT:    v_add_f32_e32 v24, 1.0, v24
+; GFX9-NEXT:    v_add_f32_e32 v23, 1.0, v23
+; GFX9-NEXT:    v_add_f32_e32 v22, 1.0, v22
+; GFX9-NEXT:    v_add_f32_e32 v21, 1.0, v21
+; GFX9-NEXT:    v_add_f32_e32 v20, 1.0, v20
+; GFX9-NEXT:    v_add_f32_e32 v19, 1.0, v19
+; GFX9-NEXT:    v_add_f32_e32 v18, 1.0, v18
+; GFX9-NEXT:    v_add_f32_e32 v17, 1.0, v17
+; GFX9-NEXT:    v_add_f32_e32 v16, 1.0, v16
+; GFX9-NEXT:    v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT:    v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT:    v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT:    v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT:    v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT:    v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT:    v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT:    v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT:    v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT:    v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT:    v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT:    v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT:    v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT:    v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT:    v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT:    v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT:  .LBB1_2: ; %end
+; GFX9-NEXT:    s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f32_to_v32i32:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_clause 0x1
+; GFX11-NEXT:    scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT:    scratch_load_b32 v31, off, s32
+; GFX11-NEXT:    s_mov_b32 s0, exec_lo
+; GFX11-NEXT:    s_waitcnt vmcnt(1)
+; GFX11-NEXT:    v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT:    s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT:    s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT:    s_cbranch_execz .LBB1_2
+; GFX11-NEXT:  ; %bb.1: ; %cmp.true
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
+; GFX11-NEXT:    v_dual_add_f32 v31, 1.0, v31 :: v_dual_add_f32 v30, 1.0, v30
+; GFX11-NEXT:    v_dual_add_f32 v29, 1.0, v29 :: v_dual_add_f32 v28, 1.0, v28
+; GFX11-NEXT:    v_dual_add_f32 v27, 1.0, v27 :: v_dual_add_f32 v26, 1.0, v26
+; GFX11-NEXT:    v_dual_add_f32 v25, 1.0, v25 :: v_dual_add_f32 v24, 1.0, v24
+; GFX11-NEXT:    v_dual_add_f32 v23, 1.0, v23 :: v_dual_add_f32 v22, 1.0, v22
+; GFX11-NEXT:    v_dual_add_f32 v21, 1.0, v21 :: v_dual_add_f32 v20, 1.0, v20
+; GFX11-NEXT:    v_dual_add_f32 v19, 1.0, v19 :: v_dual_add_f32 v18, 1.0, v18
+; GFX11-NEXT:    v_dual_add_f32 v17, 1.0, v17 :: v_dual_add_f32 v16, 1.0, v16
+; GFX11-NEXT:    v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT:    v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT:    v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT:    v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT:    v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT:    v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT:    v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT:    v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT:  .LBB1_2: ; %end
+; GFX11-NEXT:    s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %cmp = icmp eq i32 %b, 0
+  br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+  %a1 = fadd <32 x float> %a, splat (float 1.000000e+00)
+  %a2 = bitcast <32 x float> %a1 to <32 x i32>
+  br label %end
+
+cmp.false:
+  %a3 = bitcast <32 x float> %a to <32 x i32>
+  br label %end
+
+end:
+  %phi = phi <32 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+  ret <32 x i32> %phi
+}
+
+define <16 x i64> @v_bitcast_v32i32_to_v16i64(<32 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i32_to_v16i64:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT:    s_waitcnt vmcnt(1)
+; GCN-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT:    s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT:    s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT:    s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT:    s_cbranch_execz .LBB2_2
+; GCN-NEXT:  ; %bb.1: ; %cmp.true
+; GCN-NEXT:    s_waitcnt vmcnt(0)
+; GCN-NEXT:    v_add_i32_e32 v31, vcc, 3, v31
+; GCN-NEXT:    v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT:    v_add_i32_e32 v29, vcc, 3, v29
+; GCN-NEXT:    v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT:    v_add_i32_e32 v27, vcc, 3, v27
+; GCN-NEXT:    v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT:    v_add_i32_e32 v25, vcc, 3, v25
+; GCN-NEXT:    v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT:    v_add_i32_e32 v23, vcc, 3, v23
+; GCN-NEXT:    v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT:    v_add_i32_e32 v21, vcc, 3, v21
+; GCN-NEXT:    v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT:    v_add_i32_e32 v19, vcc, 3, v19
+; GCN-NEXT:    v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT:    v_add_i32_e32 v17, vcc, 3, v17
+; GCN-NEXT:    v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT:    v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT:    v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT:   ...
[truncated]

@Shoreshen Shoreshen requested review from arsenm and shiltian March 26, 2025 08:45
@Pierre-vh
Copy link
Contributor

Please add [AMDGPU] prefix to the title

Also what are these lit tests for ? why do we need to test all widths?

@Shoreshen Shoreshen changed the title add tests for loop definition of bitconvert [AMDGPU] add tests for loop definition of bitconvert Mar 26, 2025
@Shoreshen
Copy link
Contributor Author

Please add [AMDGPU] prefix to the title

Also what are these lit tests for ? why do we need to test all widths?

Hi @Pierre-vh , this is adding test PR for #132899, which using loop in tablegen to generate bitconvert patterns, instead of doing it one by one.... Thanks~

@arsenm
Copy link
Contributor

arsenm commented Mar 27, 2025

Also what are these lit tests for ? why do we need to test all widths?

Every operation with every type should be tested.

For this case, we already should have test coverage so I'm not sure I understand what is new here. We should perhaps split / replace the existing test with a more consistent pattern, but this patch is purely additive?

; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s

define <5 x float> @v_bitcast_v5i32_to_v5f32(<5 x i32> %a, i32 %b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have scalar version

@Shoreshen
Copy link
Contributor Author

Also what are these lit tests for ? why do we need to test all widths?

Every operation with every type should be tested.

For this case, we already should have test coverage so I'm not sure I understand what is new here. We should perhaps split / replace the existing test with a more consistent pattern, but this patch is purely additive?

Hi @arsenm , I removed the old tests and split them into different files named by bit width.

@Shoreshen Shoreshen requested a review from arsenm March 31, 2025 02:29
@Shoreshen
Copy link
Contributor Author

Hi @shiltian @arsenm , just ask if this PR is OK. Any fix up need, please comment~ Thanks~

@Shoreshen Shoreshen merged commit 49d6e39 into llvm:main Apr 8, 2025
11 checks passed
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck -check-prefixes=GFX11 %s

define amdgpu_kernel void @bitcast_i8ptr_v16i8ptr(ptr addrspace(1) %out, ptr addrspace(1) %in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "ptr" in the name? There is no pointer cast here. This is also not testing the reverse direction, or all of the interesting type sizes. I think we should be generating all of the cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm , this case is a copy from the old file llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.ll: L56~L116:

define amdgpu_kernel void @i8ptr_v16i8ptr(ptr addrspace(1) %out, ptr addrspace(1) %in) {
; GCN-LABEL: i8ptr_v16i8ptr:
; GCN:       ; %bb.0: ; %entry
; GCN-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x9
; GCN-NEXT:    s_waitcnt lgkmcnt(0)
; GCN-NEXT:    s_load_dwordx4 s[4:7], s[2:3], 0x0
; GCN-NEXT:    s_mov_b32 s3, 0xf000
; GCN-NEXT:    s_mov_b32 s2, -1
; GCN-NEXT:    s_waitcnt lgkmcnt(0)
; GCN-NEXT:    v_mov_b32_e32 v0, s4
; GCN-NEXT:    v_mov_b32_e32 v1, s5
; GCN-NEXT:    v_mov_b32_e32 v2, s6
; GCN-NEXT:    v_mov_b32_e32 v3, s7
; GCN-NEXT:    buffer_store_dwordx4 v[0:3], off, s[0:3], 0
; GCN-NEXT:    s_endpgm
;
; VI-LABEL: i8ptr_v16i8ptr:
; VI:       ; %bb.0: ; %entry
; VI-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x24
; VI-NEXT:    s_waitcnt lgkmcnt(0)
; VI-NEXT:    s_load_dwordx4 s[4:7], s[2:3], 0x0
; VI-NEXT:    v_mov_b32_e32 v4, s0
; VI-NEXT:    v_mov_b32_e32 v5, s1
; VI-NEXT:    s_waitcnt lgkmcnt(0)
; VI-NEXT:    v_mov_b32_e32 v0, s4
; VI-NEXT:    v_mov_b32_e32 v1, s5
; VI-NEXT:    v_mov_b32_e32 v2, s6
; VI-NEXT:    v_mov_b32_e32 v3, s7
; VI-NEXT:    flat_store_dwordx4 v[4:5], v[0:3]
; VI-NEXT:    s_endpgm
;
; GFX9-LABEL: i8ptr_v16i8ptr:
; GFX9:       ; %bb.0: ; %entry
; GFX9-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x24
; GFX9-NEXT:    v_mov_b32_e32 v4, 0
; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
; GFX9-NEXT:    s_load_dwordx4 s[4:7], s[2:3], 0x0
; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
; GFX9-NEXT:    v_mov_b32_e32 v0, s4
; GFX9-NEXT:    v_mov_b32_e32 v1, s5
; GFX9-NEXT:    v_mov_b32_e32 v2, s6
; GFX9-NEXT:    v_mov_b32_e32 v3, s7
; GFX9-NEXT:    global_store_dwordx4 v4, v[0:3], s[0:1]
; GFX9-NEXT:    s_endpgm
;
; GFX11-LABEL: i8ptr_v16i8ptr:
; GFX11:       ; %bb.0: ; %entry
; GFX11-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
; GFX11-NEXT:    s_load_b128 s[4:7], s[2:3], 0x0
; GFX11-NEXT:    v_mov_b32_e32 v4, 0
; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
; GFX11-NEXT:    v_dual_mov_b32 v0, s4 :: v_dual_mov_b32 v3, s7
; GFX11-NEXT:    v_dual_mov_b32 v1, s5 :: v_dual_mov_b32 v2, s6
; GFX11-NEXT:    global_store_b128 v4, v[0:3], s[0:1]
; GFX11-NEXT:    s_endpgm
entry:
  %0 = load <16 x i8>, ptr addrspace(1) %in
  store <16 x i8> %0, ptr addrspace(1) %out
  ret void
}

I'm kind of not sure how I should handle this because there is no bitcast.....

@@ -0,0 +1,68 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5

; RUN: llc -mtriple=amdgcn < %s | FileCheck -check-prefix=GCN %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using the default target. -mcpu=tahiti? Also extra spaces before <

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not convinced we aren't losing test coverage. The content of some of the new bitwidth suffixed files don't have enough functions to justify splitting up. I'd prefer generated, wholly consistent tests with the suffixes if we're splitting this up

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 8, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/16963

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn  < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GCN /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll # RUN: at line 3
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GCN /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=tonga  < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll # RUN: at line 4
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=tonga

# After SI post-RA bundler
# Machine code for function bitcast_v64i16_to_v128i8: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Frame Objects:
  fi#-3: size=4, align=8, fixed, at location [SP+8]
  fi#-2: size=4, align=4, fixed, at location [SP+4]
  fi#-1: size=4, align=16, fixed, at location [SP]
  fi#0: size=8, align=4, at location [SP+76]
  fi#1: size=8, align=4, at location [SP+84]
  fi#2: size=8, align=4, at location [SP+92]
  fi#3: size=8, align=4, at location [SP+100]
  fi#4: size=8, align=4, at location [SP+108]
  fi#5: size=8, align=4, at location [SP+116]
  fi#6: size=8, align=4, at location [SP+124]
  fi#7: size=8, align=4, at location [SP+132]
  fi#8: size=8, align=4, at location [SP+140]
  fi#9: size=8, align=4, at location [SP+148]
  fi#10: size=4, align=4, at location [SP+156]
  fi#11: size=4, align=4, at location [SP+160]
  fi#12: size=4, align=4, at location [SP+164]
  fi#13: size=4, align=4, at location [SP+168]
  fi#14: size=4, align=4, at location [SP+172]
  fi#15: size=4, align=4, at location [SP+176]
  fi#16: size=4, align=4, at location [SP+180]
  fi#17: size=4, align=4, at location [SP+184]
  fi#18: size=4, align=4, at location [SP+188]
  fi#19: size=4, align=4, at location [SP+192]
  fi#20: size=4, align=4, at location [SP+196]
  fi#21: size=4, align=4, at location [SP+200]
  fi#22: size=4, align=4, at location [SP+204]
  fi#23: size=4, align=4, at location [SP+208]
  fi#24: size=4, align=4, at location [SP+212]
  fi#25: size=4, align=4, at location [SP+216]
  fi#26: size=4, align=4, at location [SP+220]
  fi#27: size=4, align=4, at location [SP+224]
  fi#28: size=4, align=4, at location [SP+228]
  fi#29: size=4, align=4, at location [SP+232]
  fi#30: size=4, align=4, at location [SP+236]
  fi#31: size=4, align=4, at location [SP+240]
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 8, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/5281

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn  < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll # RUN: at line 3
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=tonga  < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=VI /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll # RUN: at line 4
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=VI /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=tonga

# After SI post-RA bundler
# Machine code for function bitcast_v64i16_to_v128i8: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Frame Objects:
  fi#-3: size=4, align=8, fixed, at location [SP+8]
  fi#-2: size=4, align=4, fixed, at location [SP+4]
  fi#-1: size=4, align=16, fixed, at location [SP]
  fi#0: size=8, align=4, at location [SP+76]
  fi#1: size=8, align=4, at location [SP+84]
  fi#2: size=8, align=4, at location [SP+92]
  fi#3: size=8, align=4, at location [SP+100]
  fi#4: size=8, align=4, at location [SP+108]
  fi#5: size=8, align=4, at location [SP+116]
  fi#6: size=8, align=4, at location [SP+124]
  fi#7: size=8, align=4, at location [SP+132]
  fi#8: size=8, align=4, at location [SP+140]
  fi#9: size=8, align=4, at location [SP+148]
  fi#10: size=4, align=4, at location [SP+156]
  fi#11: size=4, align=4, at location [SP+160]
  fi#12: size=4, align=4, at location [SP+164]
  fi#13: size=4, align=4, at location [SP+168]
  fi#14: size=4, align=4, at location [SP+172]
  fi#15: size=4, align=4, at location [SP+176]
  fi#16: size=4, align=4, at location [SP+180]
  fi#17: size=4, align=4, at location [SP+184]
  fi#18: size=4, align=4, at location [SP+188]
  fi#19: size=4, align=4, at location [SP+192]
  fi#20: size=4, align=4, at location [SP+196]
  fi#21: size=4, align=4, at location [SP+200]
  fi#22: size=4, align=4, at location [SP+204]
  fi#23: size=4, align=4, at location [SP+208]
  fi#24: size=4, align=4, at location [SP+212]
  fi#25: size=4, align=4, at location [SP+216]
  fi#26: size=4, align=4, at location [SP+220]
  fi#27: size=4, align=4, at location [SP+224]
  fi#28: size=4, align=4, at location [SP+228]
  fi#29: size=4, align=4, at location [SP+232]
  fi#30: size=4, align=4, at location [SP+236]
  fi#31: size=4, align=4, at location [SP+240]
...

Shoreshen added a commit that referenced this pull request Apr 8, 2025
Shoreshen added a commit that referenced this pull request Apr 8, 2025
function `bitcast_v64i16_to_v128i8` in newly added test file
`llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll` from
PR:#133052 failed in expansive
check. (passes normal lit check)

remove it for now
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 8, 2025
function `bitcast_v64i16_to_v128i8` in newly added test file
`llvm-project/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll` from
PR:llvm/llvm-project#133052 failed in expansive
check. (passes normal lit check)

remove it for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants