Skip to content

K-tiling for ao_to_mo_transform (nbasis > 128) #37

@scttfrdmn

Description

@scttfrdmn

The initial ao_to_mo_transform kernel (shipped in 36564cd) is single-tile only: nbasis ≤ 128, nocc * naux ≤ 512, nvir * naux ≤ 512. Larger shapes raise NotImplementedError.

Most cc-pVDZ molecules have nbasis ≤ 128, but cc-pVTZ and above exceed it. K-tiling over the μ (and ν, in step 2) axis would extend the kernel to larger systems.

Approach: add a K-tile loop around each nc_matmul, accumulating partial PSUM across tiles. The matmul_kernel already has this pattern for the single-GEMM case — port the structure into the two-step fused kernel.

Out of scope here: optimize the intermediate's HBM round-trip (currently kernel-scratch HBM between the two matmul steps to handle partition-dim change). A follow-up could explore an in-SBUF transpose primitive if NKI adds one.

Acceptance: ao_to_mo_transform succeeds for nbasis up to at least 256 with nocc * naux / nvir * naux up to 2048; hardware tests at a cc-pVTZ-representative shape pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions