[Bugfix] Raise error on zero grid dimension instead of silent clamp#1994
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughUpdated the TVM submodule reference and added a CUDA-only regression test validating TileLang behavior when Changes
sequenceDiagram
participant Test as "Test (pytest)"
participant TileLang as "TileLang JIT kernel"
participant TVM as "TVM runtime"
participant CUDA as "CUDA driver"
Test->>TileLang: invoke kernel(out, num_tokens)
TileLang->>TVM: dispatch kernel launch (grid dims)
alt any grid dim == 0
TVM->>TVM: detect zero grid dim (guard)
TVM-->>TileLang: raise error OR skip launch
TileLang-->>Test: raise exception / no-op return
else all dims > 0
TVM->>CUDA: cuLaunchKernelEx(...)
CUDA-->>TVM: launch success/failure
TVM-->>TileLang: return status
TileLang-->>Test: return / possible async error on sync
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
testing/python/issue/test_tilelang_issue_1993.py (1)
67-72: Same exception type refinement applies here.Same recommendation as above—use a specific exception type. The buffer assertion on line 72 is good practice to verify no spurious writes occurred.
♻️ Suggested improvement
- with pytest.raises(Exception): + with pytest.raises(RuntimeError, match=r"grid dimension must be positive"): kernel(out) torch.cuda.synchronize()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/python/issue/test_tilelang_issue_1993.py` around lines 67 - 72, Replace the broad pytest.raises(Exception) with the specific exception class actually raised by kernel(out) or torch.cuda.synchronize() (e.g., RuntimeError or the exact CUDA error type your code emits); update the context manager from pytest.raises(Exception) to pytest.raises(<ExactExceptionClass>) while keeping the kernel(out) and torch.cuda.synchronize() calls intact and preserving the final buffer check assert out.eq(0.0).all().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@testing/python/issue/test_tilelang_issue_1993.py`:
- Around line 49-53: Replace the broad Exception catches with the specific
tvm.TVMError in both test functions test_issue_1993_dynamic_zero_grid_dim and
test_issue_1993_static_zero_grid_dim: change the with pytest.raises(Exception):
blocks that wrap kernel(out_empty) and torch.cuda.synchronize() to with
pytest.raises(tvm.TVMError): and add the import statement from tilelang import
tvm near the top of the test file if it’s missing so the tvm symbol is
available.
---
Nitpick comments:
In `@testing/python/issue/test_tilelang_issue_1993.py`:
- Around line 67-72: Replace the broad pytest.raises(Exception) with the
specific exception class actually raised by kernel(out) or
torch.cuda.synchronize() (e.g., RuntimeError or the exact CUDA error type your
code emits); update the context manager from pytest.raises(Exception) to
pytest.raises(<ExactExceptionClass>) while keeping the kernel(out) and
torch.cuda.synchronize() calls intact and preserving the final buffer check
assert out.eq(0.0).all().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 323df7b9-c661-48bf-a105-9e109ce72a4c
📒 Files selected for processing (2)
3rdparty/tvmtesting/python/issue/test_tilelang_issue_1993.py
| # Zero case: should raise an error, not crash with illegal memory access | ||
| out_empty = torch.zeros(0, dtype=torch.float32, device="cuda") | ||
| with pytest.raises(Exception): | ||
| kernel(out_empty) | ||
| torch.cuda.synchronize() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for how ICHECK failures surface as exceptions in TVM/TileLang
# to determine the correct exception type to catch
# Look for existing tests that catch TVM runtime errors
rg -n "pytest\.raises.*RuntimeError|pytest\.raises.*TVMError" --type=py testing/Repository: tile-ai/tilelang
Length of output: 499
🏁 Script executed:
# Read the test file around the flagged lines to understand context
fd test_tilelang_issue_1993.py --type=f -x cat -n {}Repository: tile-ai/tilelang
Length of output: 2609
🏁 Script executed:
# Search for ICHECK statements in the codebase to understand what exception they raise
rg -n "ICHECK" --type=py --type=cpp --type=cc -B 2 -A 2 | head -100Repository: tile-ai/tilelang
Length of output: 86
🏁 Script executed:
# Search for grid dimension validation or error messages
rg -i "grid.*dimension|dimension.*grid" --type=py --type=cpp -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 4717
🏁 Script executed:
# Look for kernel grid or block dimension validation
rg -n "T\.Kernel" --type=py -B 2 -A 2 | head -80Repository: tile-ai/tilelang
Length of output: 6797
🏁 Script executed:
# Check what tvm.TVMError is and if it's imported in test files
rg -n "import.*tvm|from.*tvm" testing/python/ --type=py | grep -E "TVMError|tvm" | head -20Repository: tile-ai/tilelang
Length of output: 2176
🏁 Script executed:
# Search for grid dimension validation/checking code in kernel-related files
rg -n "grid|block|kernel" tilelang/language/kernel.py --type=py | head -60Repository: tile-ai/tilelang
Length of output: 3567
🏁 Script executed:
# Look for ICHECK patterns that might validate grid dimensions
rg -n "ICHECK.*grid|grid.*0|dimension.*0" --type=py --type=cpp | head -40Repository: tile-ai/tilelang
Length of output: 4297
🏁 Script executed:
# Check if tvm is imported or accessible in the tilelang package
rg -n "from.*tvm|import.*tvm" tilelang/ --type=py | head -20Repository: tile-ai/tilelang
Length of output: 1422
🏁 Script executed:
# Search for where grid dimension 0 is validated or checked
rg -n "grid.*>.*0|grid.*== 0|grid.*!= 0|positive.*grid" --type=py --type=cpp | head -30Repository: tile-ai/tilelang
Length of output: 801
🏁 Script executed:
# Look for ICHECK statements related to grid that might validate dimension > 0
rg -n "ICHECK.*grid" --type=cpp -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 1946
🏁 Script executed:
# Search in the wrapper/adapter code where kernel launches are prepared
rg -n "gridDim|grid_info" tilelang/jit/adapter/wrapper.py --type=py -B 3 -A 3 | head -80Repository: tile-ai/tilelang
Length of output: 3694
🏁 Script executed:
# Search for how grid dimension validation happens in kernel launch
rg -n "dynamic.*grid|grid.*dynamic" tilelang/jit/ --type=py -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Look at the exact test file to see what's imported and what's available
cat -n testing/python/issue/test_tilelang_issue_1993.pyRepository: tile-ai/tilelang
Length of output: 2609
🏁 Script executed:
# Search for where the kernel is actually compiled/launched to see where the validation happens
rg -n "def kernel|LaunchThread|KernelLaunch" tilelang/language/kernel.py --type=py -B 2 -A 5 | head -60Repository: tile-ai/tilelang
Length of output: 2081
🏁 Script executed:
# Search for where runtime validation happens for grid dimensions
rg -n "must be positive|positive.*grid|grid.*positive|must be.*non-zero" --type=py --type=cppRepository: tile-ai/tilelang
Length of output: 2770
🏁 Script executed:
# Look for exception raising in the wrapper code
rg -n "raise.*Error|raise.*Exception|RuntimeError|TVMError" tilelang/jit/adapter/wrapper.py --type=py -B 2 -A 2 | head -60Repository: tile-ai/tilelang
Length of output: 2595
🏁 Script executed:
# Check if there's any validation in the tilelang package that throws on grid dimension 0
rg -n "grid.*== 0|grid_info.*0" tilelang/jit/ --type=py -B 3 -A 3Repository: tile-ai/tilelang
Length of output: 2298
🏁 Script executed:
# Check if tvm or TVMError is accessible from tilelang package exports
rg -n "TVMError|from tvm import" tilelang/__init__.py --type=py -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 237
🏁 Script executed:
# Look at how the test_tilelang_transform_make_packed_api.py imports and uses TVMError
cat -n testing/python/transform/test_tilelang_transform_make_packed_api.py | head -40Repository: tile-ai/tilelang
Length of output: 1585
🏁 Script executed:
# Check the runtime validation code to understand what exception is raised
cat -n src/runtime/runtime.cc | sed -n '340,365p'Repository: tile-ai/tilelang
Length of output: 1366
🏁 Script executed:
# Look at how grid dimensions are validated and what exception is raised
rg -n "issues.push_back.*globalDim" src/runtime/runtime.cc -B 5 -A 10Repository: tile-ai/tilelang
Length of output: 1630
🏁 Script executed:
# Check what happens with the issues list - how they're converted to exceptions
rg -n "issues" src/runtime/runtime.cc | head -40Repository: tile-ai/tilelang
Length of output: 2618
🏁 Script executed:
# Verify if tvm.TVMError is the correct exception by looking at how it's used in similar validation tests
cat -n testing/python/transform/test_tilelang_transform_make_packed_api.py | sed -n '170,195p'Repository: tile-ai/tilelang
Length of output: 813
Use tvm.TVMError instead of the overly broad Exception.
The test currently catches Exception, which could mask unrelated failures. Based on the runtime validation logic, when grid dimensions resolve to zero, a tvm.TVMError is raised with the message "globalDim[0] must be non-zero". This matches the pattern used in other validation tests in the codebase.
Update both test functions to use the specific exception type:
♻️ Suggested improvements
In test function test_issue_1993_dynamic_zero_grid_dim() (lines 51-53):
- with pytest.raises(Exception):
+ with pytest.raises(tvm.TVMError):
kernel(out_empty)
torch.cuda.synchronize()In test function test_issue_1993_static_zero_grid_dim() (lines 67-69):
- with pytest.raises(Exception):
+ with pytest.raises(tvm.TVMError):
kernel(out)
torch.cuda.synchronize()Note: You'll need to import tvm in the test file. Add from tilelang import tvm near the top if not already present.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Zero case: should raise an error, not crash with illegal memory access | |
| out_empty = torch.zeros(0, dtype=torch.float32, device="cuda") | |
| with pytest.raises(Exception): | |
| kernel(out_empty) | |
| torch.cuda.synchronize() | |
| # Zero case: should raise an error, not crash with illegal memory access | |
| out_empty = torch.zeros(0, dtype=torch.float32, device="cuda") | |
| with pytest.raises(tvm.TVMError): | |
| kernel(out_empty) | |
| torch.cuda.synchronize() |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 51-51: Do not assert blind exception: Exception
(B017)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@testing/python/issue/test_tilelang_issue_1993.py` around lines 49 - 53,
Replace the broad Exception catches with the specific tvm.TVMError in both test
functions test_issue_1993_dynamic_zero_grid_dim and
test_issue_1993_static_zero_grid_dim: change the with pytest.raises(Exception):
blocks that wrap kernel(out_empty) and torch.cuda.synchronize() to with
pytest.raises(tvm.TVMError): and add the import statement from tilelang import
tvm near the top of the test file if it’s missing so the tvm symbol is
available.
Fixes #1993.
TVM's
ThreadWorkLoad::Extract()pre-fills all grid/block dims with 1, and only overwrites them when the actual value is > 0. So if a grid dimension is 0 — either from a dynamic shape likenum_tokensbeing 0 at runtime, or from a literalT.Kernel(0)— the kernel still launches withgrid=(1,1,1). No error, no warning. This causes either aCUDA_ERROR_ILLEGAL_ADDRESScrash (the zero-size tensor hasdata_ptr()==NULL, so the kernel writes through a null pointer) or silent wrong results depending on constant folding.The fix is straightforward:
thread_storage_scope.h, stop clamping: just write the actual size unconditionally, so a 0 stays 0.cuda_module.cc, add anICHECKbefore anycuLaunchKernel*call that checks all three grid dims are positive, with a message that tells you exactly which dim is zero and which kernel it is.For example, instead of getting the cryptic async
CUDA_ERROR_ILLEGAL_ADDRESSatsynchronize(), you now get:Regression test included for both the dynamic and static cases.
Summary by CodeRabbit
Tests
Chores