Skip to content

NKI compiler emits trn2-only shared-memory on trn1 for fused XLA graph #39

@scttfrdmn

Description

@scttfrdmn

Status: open AWS SDK limitation, not a trntensor bug.

Symptom: When the full DF-MP2 pipeline runs with all operands pre-pinned on XLA —

B_x = trntensor.ao_to_mo_transform(eri_x, C_occ_x, C_vir_x)  # OK
E_x = trntensor.mp2_energy(B_x, eps_occ_x, eps_vir_x)         # FAILS

— the NKI compiler raises:

Shared memory is only supported on trn2, but inst__I-7-0:_mem_0_0_set
is using Shared memory on an unsupported target

The combined XLA lazy graph spanning both kernels triggers a code-gen path that chooses trn2-specific shared memory instructions. On trn1 this fails at the verifier. Our individual kernels compile fine when called in isolation.

What we tried:

Practical effect: Users must currently from_xla B between the two kernel calls (defeats residency for this specific pipeline). test_pipeline_composition is @pytest.mark.skip'd referencing this issue; other residency tests (test_matmul_stays_on_xla, test_residency_speedup) work fine.

Escalation path: per the NKI error message, open an AWS Neuron SDK issue at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. Attach the HLO module dump (set XLA_IR_DEBUG=1 + XLA_HLO_DEBUG=1) from a reproduction.

Linked: #34 (residency, shipped), #35 (mark_step investigation — this now has a concrete case), #38 (eps reshape — closing separately because the reshape is still correct).

Metadata

Metadata

Assignees

No one assigned

    Labels

    infraCI, tooling, deployment, repo hygiene

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions