Status: open AWS SDK limitation, not a trntensor bug.
Symptom: When the full DF-MP2 pipeline runs with all operands pre-pinned on XLA —
B_x = trntensor.ao_to_mo_transform(eri_x, C_occ_x, C_vir_x) # OK
E_x = trntensor.mp2_energy(B_x, eps_occ_x, eps_vir_x) # FAILS
— the NKI compiler raises:
Shared memory is only supported on trn2, but inst__I-7-0:_mem_0_0_set
is using Shared memory on an unsupported target
The combined XLA lazy graph spanning both kernels triggers a code-gen path that chooses trn2-specific shared memory instructions. On trn1 this fails at the verifier. Our individual kernels compile fine when called in isolation.
What we tried:
Practical effect: Users must currently from_xla B between the two kernel calls (defeats residency for this specific pipeline). test_pipeline_composition is @pytest.mark.skip'd referencing this issue; other residency tests (test_matmul_stays_on_xla, test_residency_speedup) work fine.
Escalation path: per the NKI error message, open an AWS Neuron SDK issue at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. Attach the HLO module dump (set XLA_IR_DEBUG=1 + XLA_HLO_DEBUG=1) from a reproduction.
Linked: #34 (residency, shipped), #35 (mark_step investigation — this now has a concrete case), #38 (eps reshape — closing separately because the reshape is still correct).
Status: open AWS SDK limitation, not a trntensor bug.
Symptom: When the full DF-MP2 pipeline runs with all operands pre-pinned on XLA —
— the NKI compiler raises:
The combined XLA lazy graph spanning both kernels triggers a code-gen path that chooses trn2-specific shared memory instructions. On trn1 this fails at the verifier. Our individual kernels compile fine when called in isolation.
What we tried:
xm.mark_step()in_to_xlafast-path when operands are pre-pinned — forces the graph to flush, but flush itself is what produces the trn2-only code.Practical effect: Users must currently
from_xlaB between the two kernel calls (defeats residency for this specific pipeline).test_pipeline_compositionis@pytest.mark.skip'd referencing this issue; other residency tests (test_matmul_stays_on_xla,test_residency_speedup) work fine.Escalation path: per the NKI error message, open an AWS Neuron SDK issue at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. Attach the HLO module dump (set
XLA_IR_DEBUG=1+XLA_HLO_DEBUG=1) from a reproduction.Linked: #34 (residency, shipped), #35 (mark_step investigation — this now has a concrete case), #38 (eps reshape — closing separately because the reshape is still correct).