Release v0.4.0 — DF-MP2 end-to-end validation + NKI energy kernel · trnsci/trnblas

Highlights

Real-molecule DF-MP2 validation against PySCF (#11) — trnblas matches PySCF's own mp.dfmp2.DFMP2 reference to nanohartree precision on H2O/STO-3G, H2O/cc-pvdz, CH4/cc-pvdz, NH3/cc-pvdz. New pip install trnblas[pyscf] extra, runnable examples/df_mp2_pyscf.py demo.
Fused MP2 energy-reduction NKI kernel (#15, Phase 1) — trnblas.nki.nki_mp2_energy with partition-dim sub-tiling. Validated on trn1 across nvir ∈ {8, 16, 64, 256, 448}. Scaffold landed; further perf work tracked under #15.
DF-MP2 step-4 collapse (#14) — energy reduction replaced from nocc² sequential batched dispatches with one chunked GEMM via the algebraic identity T_full = X @ X.T. On trn1.2xlarge:

Shape Flops Cold Warm TFLOPS

small (128/16/384) 3.4 G 0.025s 0.008s 0.43

medium (512/64/1536) 2757 G 12.9s 9.77s 0.28

large (768/96/2304) 20352 G 65.9s 62.8s 0.32
Trainium CI infrastructure — Terraform module for a persistent trn1 test instance, SSM-driven runners (scripts/run_neuron_tests.sh, scripts/run_df_mp2_bench.sh), docs at docs/aws_setup.md.
NKI GEMM kernel wired to real nisa.nc_matmul with stationary tile reuse + HBM padding for arbitrary shapes. nki_batched_gemm dispatches per-slice through the cached kernel. 17/17 hardware tests pass.
Repository transfer — now at trnsci/trnblas. Docs at https://trnsci.dev/trnblas/.
neuronxcc floor bumped >=2.15 → >=2.24 (NKI 2.24+ nc_matmul calling convention) to unify with the rest of the trnsci suite.

Shape	Flops	Cold	Warm	TFLOPS
small (128/16/384)	3.4 G	0.025s	0.008s	0.43
medium (512/64/1536)	2757 G	12.9s	9.77s	0.28
large (768/96/2304)	20352 G	65.9s	62.8s	0.32

See the full CHANGELOG for details.

⚠️ Erratum (v0.4.3): The "GEMM per-call kernel timing" and "DF-MP2 end-to-end" tables here reported trn1 numbers that were silently torch.matmul fallback on trn1's Xeon, not NKI on the Tensor Engine. Fixed in v0.4.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0 — DF-MP2 end-to-end validation + NKI energy kernel

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Uh oh!