fix: add cudaGetLastError check after cuLaunchKernel in TVM FFI backend#2000
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
4d6c49c to
87831ff
Compare
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
3rdparty/tvm (1)
1-1: Consider broader test coverage beyond a single gemm test.The test plan mentions running only
test_tilelang_kernel_gemm::test_gemm_f16f16f16_nn. Given that this change affects CUDA error reporting infrastructure in the TVM FFI backend, consider:
- Running a broader suite of CUDA kernel tests to ensure the error-checking mechanism doesn't introduce performance regressions or break existing functionality
- Testing error scenarios specifically (e.g., kernels that intentionally trigger CUDA errors) to verify the new error reporting works as intended
- Running CI/CD integration tests if available
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@3rdparty/tvm` at line 1, Update the test plan to expand coverage beyond test_tilelang_kernel_gemm::test_gemm_f16f16f16_nn by adding a suite of CUDA kernel tests (e.g., additional tilelang kernel tests, varied GEMM shapes/precisions, and other CUDA-backed operators) to exercise the TVM FFI CUDA error reporting path, add dedicated tests that intentionally trigger CUDA errors (invalid launch config, out-of-memory, illegal memory access) to assert the new error-reporting messages and stack traces, and ensure these tests are included in CI runs or a nightly integration job to catch regressions in performance or correctness for the FFI backend error-checking logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@3rdparty/tvm`:
- Line 1: The current TVM verification script cannot run in an uninitialized
submodule and references a non-existent commit; replace it with a non-git-based
checker that verifies the required include files referenced by
cmake/load_tvm.cmake (check existence of include/tvm.h,
src/runtime/c_runtime_api.h, 3rdparty/dlpack/include/dlpack/dlpack.h, and
3rdparty/dmlc-core/include/dmlc/logging.h) and fail with a clear message listing
missing paths instead of performing git checkout or referencing commit
12b47d316230fc777d13d4199200530e8c9529e1; alternatively remove the script and
add documentation noting the compatibility gaps found against the TVM commit
882a774844993d103ae6e317ba3c7bbb5952b662 (tilelang_main) and the specific
missing include paths.
---
Nitpick comments:
In `@3rdparty/tvm`:
- Line 1: Update the test plan to expand coverage beyond
test_tilelang_kernel_gemm::test_gemm_f16f16f16_nn by adding a suite of CUDA
kernel tests (e.g., additional tilelang kernel tests, varied GEMM
shapes/precisions, and other CUDA-backed operators) to exercise the TVM FFI CUDA
error reporting path, add dedicated tests that intentionally trigger CUDA errors
(invalid launch config, out-of-memory, illegal memory access) to assert the new
error-reporting messages and stack traces, and ensure these tests are included
in CI runs or a nightly integration job to catch regressions in performance or
correctness for the FFI backend error-checking logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| @@ -1 +1 @@ | |||
| Subproject commit 12b47d316230fc777d13d4199200530e8c9529e1 | |||
| Subproject commit 882a774844993d103ae6e317ba3c7bbb5952b662 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify TVM's CMake interface remains compatible after the submodule update.
cd 3rdparty/tvm || exit 1
# Check out the new commit
git checkout 882a774844993d103ae6e317ba3c7bbb5952b662 2>/dev/null
echo "=== Checking required CMake files exist ==="
test -f cmake/config.cmake && echo "✓ cmake/config.cmake exists" || echo "✗ cmake/config.cmake MISSING"
test -f CMakeLists.txt && echo "✓ CMakeLists.txt exists" || echo "✗ CMakeLists.txt MISSING"
echo -e "\n=== Checking required include directories exist ==="
test -d include && echo "✓ include/ exists" || echo "✗ include/ MISSING"
test -d src && echo "✓ src/ exists" || echo "✗ src/ MISSING"
test -d 3rdparty/dlpack/include && echo "✓ 3rdparty/dlpack/include exists" || echo "✗ 3rdparty/dlpack/include MISSING"
test -d 3rdparty/dmlc-core/include && echo "✓ 3rdparty/dmlc-core/include exists" || echo "✗ 3rdparty/dmlc-core/include MISSING"
echo -e "\n=== Checking FFI include directory (one of these should exist) ==="
test -d ffi/include && echo "✓ ffi/include exists" || echo "✗ ffi/include not found"
test -d 3rdparty/tvm-ffi/include && echo "✓ 3rdparty/tvm-ffi/include exists" || echo "✗ 3rdparty/tvm-ffi/include not found"
echo -e "\n=== Searching for 'tvm' target definition in CMakeLists.txt ==="
rg -n "add_library\(tvm" CMakeLists.txt || echo "⚠ 'tvm' library target not found via add_library"
echo -e "\n=== Comparing structure between old and new commits ==="
echo "Old commit file structure:"
git ls-tree -r --name-only 12b47d316230fc777d13d4199200530e8c9529e1 | rg "^(cmake/config\.cmake|CMakeLists\.txt|include/|src/|3rdparty/)" | head -20
echo -e "\nNew commit file structure:"
git ls-tree -r --name-only 882a774844993d103ae6e317ba3c7bbb5952b662 | rg "^(cmake/config\.cmake|CMakeLists\.txt|include/|src/|3rdparty/)" | head -20Repository: tile-ai/tilelang
Length of output: 1086
🏁 Script executed:
#!/bin/bash
# Check if the 3rdparty/tvm directory exists and its state
echo "=== Checking 3rdparty/tvm directory ==="
ls -la 3rdparty/ | grep -E "^d.*tvm"
echo ""
# Check if it's a git submodule
echo "=== Checking .gitmodules ==="
if [ -f .gitmodules ]; then
cat .gitmodules
else
echo ".gitmodules not found"
fi
echo ""
# Check git submodule status
echo "=== Git submodule status ==="
git submodule status 3rdparty/tvm 2>/dev/null || echo "No submodule info found"
echo ""
# Check if 3rdparty/tvm/.git exists
echo "=== Checking 3rdparty/tvm/.git ==="
if [ -d 3rdparty/tvm/.git ]; then
echo "✓ .git found in 3rdparty/tvm"
cd 3rdparty/tvm && git log --oneline -5 && cd - > /dev/null
else
echo "✗ No .git directory found in 3rdparty/tvm"
fi
echo ""
# Check cmake/load_tvm.cmake for actual TVM interface expectations
echo "=== Checking cmake/load_tvm.cmake ==="
if [ -f cmake/load_tvm.cmake ]; then
wc -l cmake/load_tvm.cmake
echo "---"
head -50 cmake/load_tvm.cmake
else
echo "cmake/load_tvm.cmake not found"
fiRepository: tile-ai/tilelang
Length of output: 1614
🏁 Script executed:
#!/bin/bash
# Check what git knows about the TVM submodule remote
echo "=== Checking git config for submodule ==="
git config --file .gitmodules --get submodule.3rdparty/tvm.url
echo -e "\n=== Attempting to verify commit via git ls-remote ==="
git ls-remote https://github.com/TileLang/tvm 882a774844993d103ae6e317ba3c7bbb5952b662 2>&1 | head -5
echo -e "\n=== Checking main/master branch refs ==="
git ls-remote --heads https://github.com/TileLang/tvm 2>&1 | head -10
echo -e "\n=== Checking if this is a detached submodule (commit-based, not branch-based) ==="
# The '-' prefix in submodule status means it's at a detached commit
echo "Submodule is at detached commit: 882a774844993d103ae6e317ba3c7bbb5952b662"
echo "This is expected - submodules are typically pinned to specific commits."
echo -e "\n=== Full cmake/load_tvm.cmake for reference ==="
cat cmake/load_tvm.cmakeRepository: tile-ai/tilelang
Length of output: 2047
🏁 Script executed:
#!/bin/bash
echo "=== Checking all refs (heads, tags, remotes) for TileLang/tvm ==="
git ls-remote https://github.com/TileLang/tvm | grep 882a774844993d103ae6e317ba3c7bbb5952b662
echo -e "\n=== Checking tags in TileLang/tvm ==="
git ls-remote --tags https://github.com/TileLang/tvm 2>&1 | tail -20
echo -e "\n=== Checking if commit exists in main branch history ==="
git ls-remote https://github.com/TileLang/tvm refs/heads/main | awk '{print $1}'
echo -e "\n=== Checking CMakeLists.txt existence in repo root ==="
if [ -f CMakeLists.txt ]; then
echo "✓ CMakeLists.txt exists in repository root"
head -20 CMakeLists.txt
else
echo "✗ CMakeLists.txt NOT in repository root"
fi
echo -e "\n=== Checking if there are any existing build tests or CI that validates TVM ==="
find . -maxdepth 3 -name "*.yml" -o -name "*.yaml" 2>/dev/null | grep -E "(ci|github|test|workflow)" | head -5Repository: tile-ai/tilelang
Length of output: 1478
🏁 Script executed:
#!/bin/bash
echo "=== Verifying TVM CMake interface at commit 882a774844993d103ae6e317ba3c7bbb5952b662 ==="
echo "Checking files and directories in TileLang/tvm at this commit..."
echo ""
# Use git archive to check what files exist at that commit without cloning
echo "=== Checking if required CMake files exist in TVM commit ==="
git ls-tree -r --name-only https://github.com/TileLang/tvm 882a774844993d103ae6e317ba3c7bbb5952b662 2>&1 | grep -E "^(CMakeLists\.txt|cmake/config\.cmake)" || echo "✗ Required cmake files may not exist"
echo -e "\n=== Checking include directories structure ==="
git ls-tree -r --name-only https://github.com/TileLang/tvm 882a774844993d103ae6e317ba3c7bbb5952b662 2>&1 | grep -E "^(include/|src/|3rdparty/dlpack/include|3rdparty/dmlc-core/include|ffi/include|3rdparty/tvm-ffi/include)" | head -20
echo -e "\n=== Alternative: Using github raw content API ==="
echo "Attempting to fetch CMakeLists.txt from commit..."
curl -s -I "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/CMakeLists.txt" | head -1
echo -e "\nAttempting to fetch cmake/config.cmake from commit..."
curl -s -I "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/cmake/config.cmake" | head -1
echo -e "\nAttempting to fetch include/tvm.h from commit..."
curl -s -I "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/include/tvm.h" | head -1Repository: tile-ai/tilelang
Length of output: 604
🏁 Script executed:
#!/bin/bash
echo "=== Fetching CMakeLists.txt from TVM commit ==="
curl -s "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/CMakeLists.txt" | head -50
echo -e "\n=== Fetching cmake/config.cmake from TVM commit ==="
curl -s "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/cmake/config.cmake" | head -80
echo -e "\n=== Checking directory structure via GitHub API ==="
# Try to get the root directory listing
curl -s "https://api.github.com/repos/TileLang/tvm/contents/?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | grep -o '"name":"[^"]*"' | head -20Repository: tile-ai/tilelang
Length of output: 4719
🏁 Script executed:
#!/bin/bash
echo "=== Checking include directory structure in TVM commit ==="
curl -s "https://api.github.com/repos/TileLang/tvm/contents/include?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | head -50
echo -e "\n=== Checking src directory ==="
curl -s "https://api.github.com/repos/TileLang/tvm/contents/src?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | head -20
echo -e "\n=== Checking 3rdparty subdirectories ==="
curl -s "https://api.github.com/repos/TileLang/tvm/contents/3rdparty?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | grep -o '"name":"[^"]*"'
echo -e "\n=== Checking dlpack structure ==="
curl -s "https://api.github.com/repos/TileLang/tvm/contents/3rdparty/dlpack?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | grep -E '"name"' | head -10
echo -e "\n=== Checking dmlc-core structure ==="
curl -s "https://api.github.com/repos/TileLang/tvm/contents/3rdparty/dmlc-core?ref=882a774844993d103ae6e317ba3c7bbb5952b662" 2>&1 | grep -E '"name"' | head -10
echo -e "\n=== Summary: Verifying cmake/load_tvm.cmake expectations ==="
echo "Expected paths per cmake/load_tvm.cmake:"
echo " - include/ → $(curl -s -I 'https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/include/tvm.h' 2>&1 | grep -oE 'HTTP/[0-9.]+ [0-9]+' | awk '{print $2}')"
echo " - src/ → $(curl -s -I 'https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/src/' 2>&1 | grep -oE 'HTTP/[0-9.]+ [0-9]+' | awk '{print $2}')"
echo " - 3rdparty/dlpack/include → $(curl -s -I 'https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/3rdparty/dlpack/include/' 2>&1 | grep -oE 'HTTP/[0-9.]+ [0-9]+' | awk '{print $2}')"
echo " - 3rdparty/dmlc-core/include → $(curl -s -I 'https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/3rdparty/dmlc-core/include/' 2>&1 | grep -oE 'HTTP/[0-9.]+ [0-9]+' | awk '{print $2}')"Repository: tile-ai/tilelang
Length of output: 1033
🏁 Script executed:
#!/bin/bash
echo "=== Direct check: Critical TVM headers existence ==="
echo "Checking if key header files exist at commit 882a774844993d103ae6e317ba3c7bbb5952b662..."
# Check for various possible header locations
for path in \
"include/tvm.h" \
"include/tvm/ir/expr.h" \
"src/runtime/c_runtime_api.h" \
"3rdparty/dlpack/include/dlpack/dlpack.h" \
"3rdparty/dmlc-core/include/dmlc/logging.h"
do
status=$(curl -s -o /dev/null -w "%{http_code}" "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/$path")
result="✗ MISSING (HTTP $status)"
[ "$status" = "200" ] && result="✓ EXISTS (HTTP $status)"
printf " %-50s %s\n" "$path" "$result"
done
echo -e "\n=== Checking if include structure is under a subdirectory ==="
for path in \
"src/runtime/c_runtime_api.h" \
"include/tvm/runtime/c_runtime_api.h"
do
status=$(curl -s -o /dev/null -w "%{http_code}" "https://raw.githubusercontent.com/TileLang/tvm/882a774844993d103ae6e317ba3c7bbb5952b662/$path")
[ "$status" = "200" ] && echo "Found: $path"
done
echo -e "\n=== Cross-check: Review comment verification script issues ==="
echo "The verification script in the review comment has structural issues:"
echo "1. It assumes 3rdparty/tvm is initialized (git checkout won't work on uninitialized submodule)"
echo "2. It references 2 different commit hashes for comparison (12b47d316... and 882a774844...)"
echo "3. The old commit hash doesn't exist in TileLang/tvm repo"
echo ""
echo "Available TileLang/tvm branches:"
git ls-remote --heads https://github.com/TileLang/tvmRepository: tile-ai/tilelang
Length of output: 2405
Fix TVM CMake build interface verification approach.
The concern about TVM CMake compatibility is valid—the TVM commit at 882a774844993d103ae6e317ba3c7bbb5952b662 (in the tilelang_main branch) is missing critical include paths expected by cmake/load_tvm.cmake, including:
include/tvm.h(404)src/runtime/c_runtime_api.h(404)3rdparty/dlpack/include/dlpack/dlpack.h(404)3rdparty/dmlc-core/include/dmlc/logging.h(404)
However, the provided verification script cannot execute because:
- It attempts
git checkouton an uninitialized submodule (no.gitdirectory present in sandbox) - It references commit
12b47d316230fc777d13d4199200530e8c9529e1which does not exist in the TileLang/tvm fork - The script structure assumes the submodule is fully cloned, which is not the case
Either remove the script and document the specific compatibility issues found, or replace it with a working verification approach that checks required paths without relying on git operations on uninitialized submodules.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@3rdparty/tvm` at line 1, The current TVM verification script cannot run in an
uninitialized submodule and references a non-existent commit; replace it with a
non-git-based checker that verifies the required include files referenced by
cmake/load_tvm.cmake (check existence of include/tvm.h,
src/runtime/c_runtime_api.h, 3rdparty/dlpack/include/dlpack/dlpack.h, and
3rdparty/dmlc-core/include/dmlc/logging.h) and fail with a clear message listing
missing paths instead of performing git checkout or referencing commit
12b47d316230fc777d13d4199200530e8c9529e1; alternatively remove the script and
add documentation noting the compatibility gaps found against the TVM commit
882a774844993d103ae6e317ba3c7bbb5952b662 (tilelang_main) and the specific
missing include paths.
|
fake ci error |
|
@regression-perf |
Performance Regression Test ReportTriggered by: @LeiWang1999 Results
Artifacts
|
Summary
cudaPeekAtLastError()check aftercuLaunchKernelin the TVM FFI backendcuLaunchKernel's return value does not captureTILELANG_CHECK_LAST_ERROR:kernel_name: error_name - error_stringUpstream TVM PR: tile-ai/tvm#30
Fixes #1929
Test plan
test_tilelang_kernel_gemm::test_gemm_f16f16f16_nnpasses — normal path unaffected🤖 Generated with Claude Code
Summary by CodeRabbit