[CUDA] FpA IntB Gemm Kernel Test #25109

tianleiwu · 2025-06-18T22:00:05Z

Enhance MatMulNBits CUDA kernel testing:
(1) Add a kernel testing for different cuda kernels used in MatMulNBits.
(2) Refactoring the gemm profiler to use cuda allocator
(2) Add verbose logging macros.
(3) Adjustments to speed up compiling when sm90 is excluded from build.

Example kernel test output:

kunal-vaishnavi · 2025-06-19T22:32:19Z

onnxruntime/contrib_ops/cuda/llm/fpA_intB_gemm/fpA_intB_gemm_template.h

@@ -39,7 +40,9 @@
 #include "contrib_ops/cuda/llm/cutlass_heuristic.h"
 #include "contrib_ops/cuda/llm/cutlass_type_conversion.h"
 #include "contrib_ops/cuda/llm/fpA_intB_gemm/fpA_intB_gemm.h"
+#ifndef EXCLUDE_SM_90


Why is sm90 particularly slow during compilation and not newer ones?

sm90 uses a new set of files here.

Blackwell GPU will fall back to use sm80 kernel.

Since CI pipeline uses sm75 or sm86, so no need to compile sm90. This skip those files and might speed up build.

nenad1002 · 2025-06-19T22:57:30Z

onnxruntime/contrib_ops/cuda/llm/fpA_intB_gemm/fpA_intB_gemm_template.h

@@ -374,14 +375,18 @@ void CutlassFpAIntBGemmRunner<ActivationType, WeightType, QuantOp, ScaleZeroType
    dispatch_gemm_to_cutlass<ActivationType, WeightType, ScaleZeroType, BiasType, OutputType, cutlass::arch::Sm89,
                             QuantOp, EpilogueTag>(A, B, weight_scales, weight_zero_points, biases, alpha, C, m, n, k, group_size,
                                                   workspace_ptr, workspace_bytes, gemm_config, stream, occupancy);
+#ifndef EXCLUDE_SM_90


Is this needed since you already do a check without macro?

This can avoid compiling sm90_dispatch_gemm_to_cutlass in CI pipeline, or when your GPU is not H100/H200.

nenad1002 · 2025-06-19T22:58:43Z

onnxruntime/contrib_ops/cuda/llm/fpA_intB_gemv/fpA_intB_gemv.cu

@@ -67,6 +58,18 @@ void kernel_launcher(int arch, Params& params, cudaStream_t s) {

    EXEC(KernelType::BF16Int8Groupwise, BF16DetailsA, Int8DetailsW, ColumnMajorInterleavedForHopper, true);
    EXEC(KernelType::BF16Int4Groupwise, BF16DetailsA, Int4DetailsW, ColumnMajorInterleavedForHopper, true);
+#endif
+  } else {
+    // if (arch >= 89)


nit remove?

This is reserved for a new op that supports alpha.

nenad1002 · 2025-06-19T23:09:11Z

onnxruntime/contrib_ops/cuda/llm/common/logger.h

+#define PRETTY_FUNCTION __PRETTY_FUNCTION__
+#endif
+
+#define ORT_LLM_VERBOSE 0  // Set to 1 for verbose, 2 for max verbosity


#ifndef ORT_LLM_VERBOSE #define ORT_LLM_VERBOSE 0 #endif

So we can externally modify it?

add fpA intB gemm kernel test

d51f07f

tianleiwu marked this pull request as draft June 18, 2025 22:03

tianleiwu added 2 commits June 18, 2025 17:49

use size_t to avoid int overflow

339beb1

minor change

902eaa6

tianleiwu marked this pull request as ready for review June 19, 2025 20:38

tianleiwu requested review from nenad1002, jiafatom and kunal-vaishnavi June 19, 2025 20:42

kunal-vaishnavi reviewed Jun 19, 2025

View reviewed changes

nenad1002 reviewed Jun 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] FpA IntB Gemm Kernel Test #25109

[CUDA] FpA IntB Gemm Kernel Test #25109

tianleiwu commented Jun 18, 2025

Uh oh!

kunal-vaishnavi Jun 19, 2025

Uh oh!

tianleiwu Jun 19, 2025

Uh oh!

nenad1002 Jun 19, 2025

Uh oh!

tianleiwu Jun 19, 2025

Uh oh!

nenad1002 Jun 19, 2025

Uh oh!

tianleiwu Jun 19, 2025

Uh oh!

nenad1002 Jun 19, 2025

Uh oh!

Uh oh!

[CUDA] FpA IntB Gemm Kernel Test #25109

Are you sure you want to change the base?

[CUDA] FpA IntB Gemm Kernel Test #25109

Conversation

tianleiwu commented Jun 18, 2025

Uh oh!

kunal-vaishnavi Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

tianleiwu Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nenad1002 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

tianleiwu Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nenad1002 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

tianleiwu Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

nenad1002 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!