Release v0.74.0-dev20260627 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/28273668996

LLK (low-level kernels)

Comparison-to-zero SFPU - Compute API PR 47804
ring_mla reduce_trigger determinism: two-phase handshake fix PR 48061
Feature: Add Quasar SFPU TopK kernel (local-sort / merge / rebuild) PR 44729
Enable square SFPU op PR 47813
Unpack tilize operands kernel compatible with reduce PR 47247
Add 32bit transpose-dest with unpack_to_dest PR 47936
fix llk blackhole ci timeout PR 48254
fast-untilize BFP: write SCRATCH_SEC0_val via ordered WRCFG PR 48161
Add GeLU kernel that uses tanh approximation PR 47399

Metalium (tt-metal core)

#45821: clear program cache on MeshDevice reconfiguration PR 47921
#48155: plumb quasar through stack to allow models pytests PR 48158
Comparison-to-zero SFPU - Compute API PR 47804
ring_mla reduce_trigger determinism: two-phase handshake fix PR 48061
Feature: Add Quasar SFPU TopK kernel (local-sort / merge / rebuild) PR 44729
Enable square SFPU op PR 47813
Add 32bit transpose-dest with unpack_to_dest PR 47936
Add support for Quasar device read/write to host over PCIe PR 47888
SFPI 7.62.0 722 PR 48282
LocalTensorAccessor: node-local tensor access (legacy + Metal 2.0) PR 48190
More descriptive error on cache miss in trace capture PR 46535
Add GeLU kernel that uses tanh approximation PR 47399
H2D stream service perf optimizations PR 47857

TT-NN

#47797: port more experimental/quasar ops to metal 2.0 api PR 47853
tilize/to_layout: support FP8_E4M3 input (-> any float TILE format) PR 48046
Adding padding awareness to moe_grouped_gate and dispatch PR 44272
fix hash collision 45821 all reduce create qkv heads device operation PR 47527
clamp_bw / clip_bw: fix inverted ge/le in tensor both-bounds branch PR 48204
Alexey zaytsev epam/fix hash collision 45821 dropout PR 47345
Fix [CI] test_pow_fractional_composite PCC PR 48227
#48155: plumb quasar through stack to allow models pytests PR 48158
#23179: INT32 and UINT32 large scalars in binary PR 48209
ring_mla reduce_trigger determinism: two-phase handshake fix PR 48061
group_norm: reject ROW_MAJOR interleaved input instead of hanging (#47972) PR 48143
indexer_score: MiniMax M3 GQA support (indexer_score_msa) PR 48205
Fix LLK assert sanity coverage PR 48067
Add 32bit transpose-dest with unpack_to_dest PR 47936
experimental/deepseek_prefill: migrate kernels to Device 2.0 PR 47137
fix hash collision 45821 ring joint sdpa device operation ring attention all gather async device operation PR 47559
fix hash collision 45821 all reduce async device operation PR 47526
fix hash collision 45821 minimal matmul strided reduce scatter async PR 47529
Alexey zaytsev epam/fix hash collision 45821 reduce scatter minimal async device operation PR 47531
fix hash collision 45821 neighbor pad async device operation PR 47530
Alexey zaytsev epam/fix hash collision 45821 slice reshard async device operation PR 47534
Alexey zaytsev epam/fix hash collision 45821 strided all gather minimal matmul async PR 47535
Migrate moreh dataflow kernels to Device 2.0 API PR 47923
Ign/reduce sum int32 PR 44061
sparse_sdpa_msa: add native GQA support for MSA prefill PR 48045
Refactor TTNN comparison mode feature to work with ttnn.graph_report PR 45448
Add GeLU kernel that uses tanh approximation PR 47399
H2D stream service perf optimizations PR 47857

tt-train

Bug fix: Remove int conversion for head_dim PR 48272

Models

#48195: adjust resnet50 quasar test for flexible grids PR 48196
GPT-OSS: fix 120B router-weights test via realistic dummy weight scale PR 47970
stable diffusion CI errors fix PR 48202
Adding padding awareness to moe_grouped_gate and dispatch PR 44272
Qwen3-32B bringup to TTTv2 PR 47353
#48155: plumb quasar through stack to allow models pytests PR 48158
Pipeline Prefill PR 47420
qwen25_vl: fix Qwen2.5-VL-32B on-device decode gibberish on wh_llmbox_perf (#48037) PR 47822
Llama-3.3-70B bringup to TTTv2 PR 47350
Remove skip_for_BH and add op tests to L2 tests pipeline PR 48063
Fix LLK assert sanity coverage PR 48067
#48242: point more resnet50/quasar op calls to experimental/quasar ops PR 48248
precise pipeline prefill chunk timing + code_debug default for kimi PR 48246
Qwen2.5-7B model bringup to TTTv2 PR 43814
Falcon-40B tests: fix transformers 5.x silent weight-load failure (PCC/NaN) — #47924 PR 47929
Manifest-Driven Prefill Migration Tests PR 48004
Add GeLU kernel that uses tanh approximation PR 47399
H2D stream service perf optimizations PR 47857

Infrastructure & CI

Adding padding awareness to moe_grouped_gate and dispatch PR 44272
Removing install_debugger.sh script and updating CODEOWNERS PR 48234
indexer_score: MiniMax M3 GQA support (indexer_score_msa) PR 48205
Remove skip_for_BH and add op tests to L2 tests pipeline PR 48063
Disable Galaxy DiT Flux.1 perf test (bricks runners with TLB-window leak) PR 47968
CI: Reduce Tier 2 Models Unit pipeline frequency to once a day PR 48223
Remove Llama from Galaxy stress pipeline (#47407) PR 48247
sparse_sdpa_msa: add native GQA support for MSA prefill PR 48045
Make mmfusedreduce codeowners for resnet50/quasar PR 48253
remove flux1 performance tests from the glx perf tests workflow PR 48286
remove flux1 performance tests from the t3k tests workflow PR 48284
Bump ttsim version to v1.9.2 PR 48310

Documentation

indexer_score: MiniMax M3 GQA support (indexer_score_msa) PR 48205
sparse_sdpa_msa: add native GQA support for MSA prefill PR 48045

Other

#48155: plumb quasar through stack to allow models pytests PR 48158
ring_mla reduce_trigger determinism: two-phase handshake fix PR 48061
ci: add check-merge-conflict hook to pre-commit config PR 48237
Refactor TTNN comparison mode feature to work with ttnn.graph_report PR 45448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.74.0-dev20260627

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

LLK (low-level kernels)

Metalium (tt-metal core)

TT-NN

tt-train

Models

Infrastructure & CI

Documentation

Other

Uh oh!