[TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope by oraluben · Pull Request #34 · tile-ai/tvm

oraluben · 2026-04-20T05:09:53Z

Summary

Add 4 TIR builtins for Metal cooperative_tensor operations (MetalPerformancePrimitives / Metal 4):
- cooperative_tensor_fill
- cooperative_tensor_load
- cooperative_tensor_store
- cooperative_tensor_multiply_accumulate
Add metal.cooperative_tensor storage scope (StorageRank::kMetalCooperativeTensor)

Motivation

MetalPerformancePrimitives (MPP) provides matmul2d with cooperative_tensor operands that route to NAX tensor cores on Apple M5 and fall back to simdgroup matrix on M1-M4. These TIR builtins enable Metal backend codegen to emit MPP calls, analogous to the existing simdgroup_* builtins for the older Metal simdgroup matrix API.

Changes

include/tvm/tir/builtin.h — 4 new Op declarations
src/tir/op/builtin.cc — 4 new Op registrations
python/tvm/tir/op.py — Python wrapper functions
python/tvm/script/ir_builder/tir/ir.py — Script parser exports + __all__
src/runtime/thread_storage_scope.h — kMetalCooperativeTensor StorageRank + scope string parsing

Companion tilelang PR uses these builtins for Metal GEMM codegen targeting MPP matmul2d.

…orage scope Add TIR builtins for Metal cooperative_tensor operations (MetalPerformancePrimitives): - cooperative_tensor_fill: fill a cooperative_tensor with a value - cooperative_tensor_load: load from device/threadgroup memory - cooperative_tensor_store: store to device/threadgroup memory - cooperative_tensor_multiply_accumulate: matrix multiply-accumulate via matmul2d Add metal.cooperative_tensor storage scope (StorageRank::kMetalCooperativeTensor) for buffers backed by MPP cooperative_tensor registers, analogous to the existing metal.simdgroup scope but targeting the Metal 4 tensor operations API. These primitives enable code generation for MetalPerformancePrimitives matmul2d, which routes to NAX tensor cores on Apple M5 and falls back to simdgroup matrix instructions on M1-M4.

MTLLanguageVersion4_0 is only available in macOS 26+ SDK. Fall back to 3_1 (macOS 14+) or 3_0 for older SDKs to fix CI builds.

oraluben · 2026-04-26T03:30:15Z

Downstream PR: tile-ai/tilelang#1869 ([Metal] Add Metal GEMM support with cooperative_tensor MMA)

This PR adds the TIR builtins (cooperative_tensor_fill, cooperative_tensor_load, cooperative_tensor_store, cooperative_tensor_multiply_accumulate) and metal.cooperative_tensor storage scope that tile-ai/tilelang#1869 depends on.

Regarding testing: the TIR builtins are exercised end-to-end through tilelang's Metal GEMM tests (18 tests covering correctness across various tile configurations). We're not sure what additional TVM-level tests would be appropriate for these builtins — if there's a specific test pattern you'd like us to add (e.g. TIR script roundtrip tests, or builtin registration checks), we're happy to do so.

oraluben · 2026-04-26T03:35:04Z

Upstream PR: apache#19423 (same change targeting apache/tvm main)

The cython backend uses strict __slots__ on ObjectBase, which prevents setting _inst on PyFunctionPass without declaring it. This works with ctypes backend (which has __dict__) but fails on CI with cython.

This reverts commit d7158e9.

oraluben force-pushed the metal-cooperative-tensor branch 2 times, most recently from d1bc7ee to afe168b Compare April 20, 2026 07:18

oraluben mentioned this pull request Apr 26, 2026

fix: guard MTLLanguageVersion4_0 with SDK version check #36

Closed

oraluben added 2 commits April 26, 2026 11:14

fix: guard MTLLanguageVersion4_0 with SDK version check

29e5bc8

MTLLanguageVersion4_0 is only available in macOS 26+ SDK. Fall back to 3_1 (macOS 14+) or 3_0 for older SDKs to fix CI builds.

oraluben force-pushed the metal-cooperative-tensor branch from 2afb490 to 29e5bc8 Compare April 26, 2026 03:21

This was referenced Apr 26, 2026

[Metal] Add Metal GEMM support with simdgroup_matrix MMA tile-ai/tilelang#1869

Open

[TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope apache/tvm#19423

Open

oraluben added 2 commits April 26, 2026 11:55

fix: add __slots__ for _inst in PyFunctionPass

d7158e9

The cython backend uses strict __slots__ on ObjectBase, which prevents setting _inst on PyFunctionPass without declaring it. This works with ctypes backend (which has __dict__) but fails on CI with cython.

Revert "fix: add __slots__ for _inst in PyFunctionPass"

2bf4e46

This reverts commit d7158e9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope#34

[TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope#34
oraluben wants to merge 4 commits intotile-ai:tilelang_mainfrom
oraluben:metal-cooperative-tensor

oraluben commented Apr 20, 2026

Uh oh!

oraluben commented Apr 26, 2026 •

edited

Loading

Uh oh!

oraluben commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oraluben commented Apr 20, 2026

Summary

Motivation

Changes

Uh oh!

oraluben commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oraluben commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oraluben commented Apr 26, 2026 •

edited

Loading