K-tiling for ao_to_mo_transform (nbasis > 128)

The initial `ao_to_mo_transform` kernel (shipped in 36564cd) is single-tile only: `nbasis ≤ 128`, `nocc * naux ≤ 512`, `nvir * naux ≤ 512`. Larger shapes raise `NotImplementedError`.

Most cc-pVDZ molecules have nbasis ≤ 128, but cc-pVTZ and above exceed it. K-tiling over the μ (and ν, in step 2) axis would extend the kernel to larger systems.

**Approach:** add a K-tile loop around each `nc_matmul`, accumulating partial PSUM across tiles. The `matmul_kernel` already has this pattern for the single-GEMM case — port the structure into the two-step fused kernel.

**Out of scope here:** optimize the intermediate's HBM round-trip (currently kernel-scratch HBM between the two matmul steps to handle partition-dim change). A follow-up could explore an in-SBUF transpose primitive if NKI adds one.

**Acceptance:** `ao_to_mo_transform` succeeds for nbasis up to at least 256 with `nocc * naux` / `nvir * naux` up to 2048; hardware tests at a cc-pVTZ-representative shape pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K-tiling for ao_to_mo_transform (nbasis > 128) #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

K-tiling for ao_to_mo_transform (nbasis > 128) #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions