NKI compiler emits trn2-only shared-memory on trn1 for fused XLA graph

**Status:** open AWS SDK limitation, not a trntensor bug.

**Symptom:** When the full DF-MP2 pipeline runs with all operands pre-pinned on XLA —

```python
B_x = trntensor.ao_to_mo_transform(eri_x, C_occ_x, C_vir_x)  # OK
E_x = trntensor.mp2_energy(B_x, eps_occ_x, eps_vir_x)         # FAILS
```

— the NKI compiler raises:
```
Shared memory is only supported on trn2, but inst__I-7-0:_mem_0_0_set
is using Shared memory on an unsupported target
```

The combined XLA lazy graph spanning both kernels triggers a code-gen path that chooses trn2-specific shared memory instructions. On trn1 this fails at the verifier. Our individual kernels compile fine when called in isolation.

**What we tried:**
- 2D-eps reshape at dispatch (#38) — valid fix for the 1D-load ambiguity, but doesn't avoid this compiler path.
- `xm.mark_step()` in `_to_xla` fast-path when operands are pre-pinned — forces the graph to flush, but flush itself is what produces the trn2-only code.

**Practical effect:** Users must currently `from_xla` B between the two kernel calls (defeats residency for this specific pipeline). `test_pipeline_composition` is `@pytest.mark.skip`'d referencing this issue; other residency tests (`test_matmul_stays_on_xla`, `test_residency_speedup`) work fine.

**Escalation path:** per the NKI error message, open an AWS Neuron SDK issue at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. Attach the HLO module dump (set `XLA_IR_DEBUG=1` + `XLA_HLO_DEBUG=1`) from a reproduction.

**Linked:** #34 (residency, shipped), #35 (mark_step investigation — this now has a concrete case), #38 (eps reshape — closing separately because the reshape is still correct).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NKI compiler emits trn2-only shared-memory on trn1 for fused XLA graph #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NKI compiler emits trn2-only shared-memory on trn1 for fused XLA graph #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions