[Test] Add 1D TMA regression test for issue #1842 by kurisu6912 · Pull Request #2005 · tile-ai/tilelang

kurisu6912 · 2026-04-01T09:09:59Z

Summary

Add regression test for issue [BUG] 1D TMA load module fails #1842: 1D TMA load compilation failure when warp specialization is disabled
The bug was already fixed by [BugFix] Fix Hopper TMA lowering without warp specialization #1840, but no test covered the 1D single-dimension tensor case
Test verifies: 1D bulk TMA load/store compiles, mbarrier is emitted, and copy is correct

Context

Issue #1842 reported that T.copy on a 1D tensor with TL_DISABLE_WARP_SPECIALIZED: True failed with:

no instance of overloaded function "tl::tma_load" matches the argument list
argument types are: (float *, const float *, int, int)

The root cause was that without WS, barrier allocation passes were skipped, so the mbarrier argument was emitted as literal 0 instead of a proper mbarrier[0] reference. PR #1840 fixed this by running TMA barrier passes regardless of WS setting.

The existing tests only covered 2D descriptor-based TMA paths. This PR adds coverage for the 1D bulk copy path (cp.async.bulk).

Closes #1842

Test plan

New test test_tma_lower_1d_no_warp_specialized passes on SM100 (GB200)
All other tests in test_tilelang_issue_tma_no_ws.py still pass (1 pre-existing skip on SM100 for sparse gemm)
Edge cases verified: multiple sizes (100, 256, 1024, 7168, 32768), dtypes (fp32, fp16, bf16), with/without WS

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added CUDA regression test for TMA (Tensor Memory Accelerator) functionality on compute capability 9.0 and above, validating 1D tensor copy kernel compilation and execution correctness.

Add a regression test covering 1D single-dimension tensor TMA copy (global -> shared -> global) with warp specialization disabled. The underlying bug was fixed in #1840, but the test suite only covered 2D descriptor-based TMA paths. This test ensures the 1D bulk copy path (cp.async.bulk) also works correctly with proper mbarrier allocation. Closes #1842 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T09:10:09Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-04-01T09:11:47Z

📝 Walkthrough

Walkthrough

A new regression test (test_tma_lower_1d_no_warp_specialized) was added to validate 1D TMA kernel compilation and execution with warp specialization disabled and TMA lowering enabled. The test verifies kernel source contains expected TMA operations and validates correct output on float32 CUDA data.

Changes

Cohort / File(s)	Summary
1D TMA Regression Test `testing/python/issue/test_tilelang_issue_tma_no_ws.py`	Added new test function that compiles a 1D global-to-shared-to-global copy kernel with `TL_DISABLE_WARP_SPECIALIZED=True` and validates presence of `tl::tma_load`, `mbarrier_mem`, and `tl::tma_store` in generated source. Runs kernel on random float32 input and verifies output correctness.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

[Bugfix] Fix double buffer versioning when TMA is used without warp specialization #1962: Fixes buffer versioning for the TMA-no-WS codepath that this test directly exercises.
[BugFix] Fix Hopper TMA lowering without warp specialization #1840: Introduced TMA lowering support without warp specialization, which this test validates.
[BugFix] Update usage of tma load in SM100 manual warp-specialized examples #1946: Related TMA memory transfer and barrier synchronization changes in test coverage.

Suggested reviewers

LeiWang1999

Poem

🐰 A test hops in with TMA cheer,
One-D copies, loud and clear!
No warp specs, but barriers stay,
Load and store light the way! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title concisely describes the main change: adding a 1D TMA regression test for issue `#1842`, which aligns with the PR's primary objective.
Linked Issues check	✅ Passed	The PR adds the regression test as required by `#1842` to cover 1D bulk TMA load/store with warp-specialization disabled, validating compilation, mbarrier emission, and correctness.
Out of Scope Changes check	✅ Passed	The PR adds only the single regression test file as specified; no unrelated or out-of-scope code changes are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue-1842-1d-tma-regression-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@testing/python/issue/test_tilelang_issue_tma_no_ws.py`:
- Around line 81-99: The test only runs one configuration; extend it to iterate
the intended regression matrix by parameterizing length, dtype
(float32/float16/bfloat16), and warp-specialized (WS) enabled/disabled; modify
the test around the tma_copy_1d definition and the pass_configs block to loop
over a list of sizes, dtypes, and WS boolean values, for each build a matching
pass_configs (toggling tilelang.PassConfigKey.TL_DISABLE_WARP_SPECIALIZED) and
call _compile_tvm_ffi(tma_copy_1d, pass_configs, out_idx=[1]) for each
combination, and assert/verify outputs per combination so the test covers all
sizes, fp32/fp16/bf16, and WS on/off as intended.
- Around line 100-104: The test currently only checks that "mbarrier_mem" is
declared but not that tl::tma_load actually uses it; update the assertion after
src = kernel.get_kernel_source() to confirm the tl::tma_load invocation includes
the mbarrier reference (e.g., ensure the generated source contains a
tl::tma_load call with "mbarrier" as the barrier argument such as
"tl::tma_load(...mbarrier" or "mbarrier[") so we catch the case where the load
was passed 0 instead of the mbarrier buffer.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b23d1ac-65fd-42ff-a402-575278361712

📥 Commits

Reviewing files that changed from the base of the PR and between a82fa71 and 3955156.

📒 Files selected for processing (1)

testing/python/issue/test_tilelang_issue_tma_no_ws.py

coderabbitai · 2026-04-01T09:32:36Z

+    length = 7168
+
+    @T.prim_func
+    def tma_copy_1d(
+        a: T.Tensor((length,), T.float32),
+        out: T.Tensor((length,), T.float32),
+    ):
+        with T.Kernel(1, threads=256):
+            a_shared = T.alloc_shared((length,), T.float32)
+            T.copy(a, a_shared)
+            T.copy(a_shared, out)
+
+    pass_configs = {
+        tilelang.PassConfigKey.TL_ENABLE_FAST_MATH: False,
+        tilelang.PassConfigKey.TL_DISABLE_TMA_LOWER: False,
+        tilelang.PassConfigKey.TL_DISABLE_WARP_SPECIALIZED: True,
+    }
+    kernel = _compile_tvm_ffi(tma_copy_1d, pass_configs, out_idx=[1])
+


⚠️ Potential issue | 🟠 Major

Coverage is narrower than the stated regression matrix.

This test currently exercises only one shape/dtype/WS setting. To lock issue #1842 down, please cover the intended matrix (sizes, fp32/fp16/bf16, WS on/off).

Proposed refactor (matrix coverage)

- length = 7168 - - `@T.prim_func` - def tma_copy_1d( - a: T.Tensor((length,), T.float32), - out: T.Tensor((length,), T.float32), - ): - with T.Kernel(1, threads=256): - a_shared = T.alloc_shared((length,), T.float32) - T.copy(a, a_shared) - T.copy(a_shared, out) - - pass_configs = { - tilelang.PassConfigKey.TL_ENABLE_FAST_MATH: False, - tilelang.PassConfigKey.TL_DISABLE_TMA_LOWER: False, - tilelang.PassConfigKey.TL_DISABLE_WARP_SPECIALIZED: True, - } - kernel = _compile_tvm_ffi(tma_copy_1d, pass_configs, out_idx=[1]) + lengths = [100, 256, 1024, 7168, 32768] + dtypes = [ + (T.float32, torch.float32), + (T.float16, torch.float16), + (T.bfloat16, torch.bfloat16), + ] + for length in lengths: + for tl_dtype, torch_dtype in dtypes: + for disable_ws in [False, True]: + `@T.prim_func` + def tma_copy_1d( + a: T.Tensor((length,), tl_dtype), + out: T.Tensor((length,), tl_dtype), + ): + with T.Kernel(1, threads=256): + a_shared = T.alloc_shared((length,), tl_dtype) + T.copy(a, a_shared) + T.copy(a_shared, out) + + pass_configs = { + tilelang.PassConfigKey.TL_ENABLE_FAST_MATH: False, + tilelang.PassConfigKey.TL_DISABLE_TMA_LOWER: False, + tilelang.PassConfigKey.TL_DISABLE_WARP_SPECIALIZED: disable_ws, + } + kernel = _compile_tvm_ffi(tma_copy_1d, pass_configs, out_idx=[1]) - src = kernel.get_kernel_source() - assert "tl::tma_load" in src - assert "mbarrier_mem" in src - assert "tl::tma_store" in src + src = kernel.get_kernel_source() + assert "tl::tma_load" in src + assert "mbarrier_mem" in src + assert "tl::tma_store" in src - t = torch.randn((length,), device="cuda", dtype=torch.float32) - out = kernel(t) - torch.testing.assert_close(out, t) + t = torch.randn((length,), device="cuda", dtype=torch_dtype) + out = kernel(t) + torch.testing.assert_close(out, t) torch.cuda.synchronize()

Also applies to: 105-107

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@testing/python/issue/test_tilelang_issue_tma_no_ws.py` around lines 81 - 99, The test only runs one configuration; extend it to iterate the intended regression matrix by parameterizing length, dtype (float32/float16/bfloat16), and warp-specialized (WS) enabled/disabled; modify the test around the tma_copy_1d definition and the pass_configs block to loop over a list of sizes, dtypes, and WS boolean values, for each build a matching pass_configs (toggling tilelang.PassConfigKey.TL_DISABLE_WARP_SPECIALIZED) and call _compile_tvm_ffi(tma_copy_1d, pass_configs, out_idx=[1]) for each combination, and assert/verify outputs per combination so the test covers all sizes, fp32/fp16/bf16, and WS on/off as intended.

coderabbitai · 2026-04-01T09:32:36Z

+    src = kernel.get_kernel_source()
+    assert "tl::tma_load" in src
+    assert "mbarrier_mem" in src
+    assert "tl::tma_store" in src
+


⚠️ Potential issue | 🟠 Major

Assert mbarrier is passed into tl::tma_load, not just declared.

Line 102 only proves barrier storage exists; it doesn’t catch the original failure mode (tl::tma_load(..., 0, ...)). Please assert the load call uses mbarrier[...] directly.

Proposed fix

src = kernel.get_kernel_source() assert "tl::tma_load" in src assert "mbarrier_mem" in src assert "tl::tma_store" in src + flat_src = " ".join(src.split()) + assert re.search(r"tl::tma_load\([^,]+,\s*mbarrier\[[0-9]+\]", flat_src) + assert not re.search(r"tl::tma_load\([^,]+,\s*0\b", flat_src)

Based on learnings: validating transformation behavior via generated kernel source pattern checks is an expected and appropriate testing strategy.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@testing/python/issue/test_tilelang_issue_tma_no_ws.py` around lines 100 - 104, The test currently only checks that "mbarrier_mem" is declared but not that tl::tma_load actually uses it; update the assertion after src = kernel.get_kernel_source() to confirm the tl::tma_load invocation includes the mbarrier reference (e.g., ensure the generated source contains a tl::tma_load call with "mbarrier" as the barrier argument such as "tl::tma_load(...mbarrier" or "mbarrier[") so we catch the case where the load was passed 0 instead of the mbarrier buffer.

kurisu6912 marked this pull request as ready for review April 1, 2026 09:28

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

kurisu6912 linked an issue Apr 2, 2026 that may be closed by this pull request

[BUG] TMA fails #1648

Closed

2 tasks

LeiWang1999 approved these changes Apr 3, 2026

View reviewed changes

LeiWang1999 merged commit 5f70374 into main Apr 3, 2026
5 checks passed

LeiWang1999 deleted the fix/issue-1842-1d-tma-regression-test branch April 14, 2026 06:06

coderabbitai bot mentioned this pull request Apr 15, 2026

Add regression test for 1D TMA load compilation and execution #1989

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] Add 1D TMA regression test for issue #1842#2005

[Test] Add 1D TMA regression test for issue #1842#2005
LeiWang1999 merged 1 commit intomainfrom
fix/issue-1842-1d-tma-regression-test

kurisu6912 commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kurisu6912 commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kurisu6912 commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading