Skip to content

v0.74.0-dev20260626

Pre-release
Pre-release

Choose a tag to compare

@github-actions github-actions released this 26 Jun 03:26
· 120 commits to main since this release
Immutable release. Only release title and notes can be modified.
e00ea66

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/28210418510

LLK (low-level kernels)

  • Simplify llk smoke CI PR 47920
  • Add generalized_moe_gate op on BH PR 47673
  • fix: eliminate deprecated implicit cast in wh sfpu reciprocal kernel PR 48022
  • ci: speed up ttsim llk test runs PR 48047
  • Use functional cast, not static_cast, optimize x^n PR 47436
  • Use sfpi::as not sfpi::reinterpret PR 48055
  • SFPU in parallel with FPU PR 47284
  • Feature: Quasar SFPU Compute API for Gelu PR 47860
  • Feature: Quasar SFPU Compute API for Tanh PR 47680

Metalium (tt-metal core)

  • Temporarily disable watcher NOC sanitization on Quasar in fast dispatch mode PR 47892
  • Bug fix: Fix profiler flush in dynamic noc mode PR 37642
  • SFPI 7.61.0 719 PR 47976
  • Fix int fpu datacopy MOP wrongly enabled on WH/BH in copy_tile_to_dst PR 47985
  • Performance & robustness pass on the UniAD TTNN port PR 45694
  • Add generalized_moe_gate op on BH PR 47673
  • Forbid DM-kernel DFB self-loops in ProgramSpec validation PR 48012
  • Automated UMD Bump 24.06.2026 PR 47948
  • Make DFB PACK finish_impl wait for UNPACK drain on INTRA self-loop PR 45158
  • Back run-arg Tables with a small-vector PR 48060
  • Feature: Quasar SFPU Compute API for Gelu PR 47860
  • Fix sharded layernorm JIT compile under x86 clang PR 47828
  • emule: add -I <src>/ttnn to JIT include flags PR 47678
  • Feature: Quasar SFPU Compute API for Tanh PR 47680
  • Dispatch Telemetry Add Utilization Monitoring PR 46672
  • #48160: adjust grid of quasar_32_arch.yaml PR 48162

TT-NN

  • Add dtype validation PR 47019
  • Bug fix: Fix LLK asserts sanity nightly run PR 47917
  • per-device chunk_start from mesh coordinate (ring_joint_sdpa style) PR 47939
  • Fix INT32 type conversion in tilize with padding PR 47977
  • ds_prefill - Combine operator changing layout of data processing to be expert-centric PR 47784
  • ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
  • restrict D2H host-IO sweeps tests to Blackhole Galaxy PR 47973
  • feat: ROW_MAJOR + fp8_e4m3 support for update_padded_kv_cache PR 47536
  • Add causal GQA support for ring joint SDPA PR 47946
  • matmul: fix batch broadcast when A batch=1 and B batch>1 PR 47640
  • conv1d: enable DRAM slicing by default, fix 4D weights & depthwise CB deadlock PR 47927
  • DS Prefill: D2D socket-sync op PR 47918
  • Make Routed Expert Write Output in Dispatch Buffer PR 47627
  • #23333: Enable skipped integer tests PR 48048
  • Bug Fix: Correct reduce scatter M non-determinism PR 47831
  • Add git repo url and commit hash to database PR 43830
  • Bug fix: Copying of mesh descriptor files to TTNN reports in multihost PR 48006
  • perf(groupnorm): reserve mask stride vector and hoist index math PR 48036

tt-train

  • Training service alignment (Cleanup) PR 43753
  • Make parallel RNG deterministic PR 47431

Models

  • Bug fix: Fix LLK asserts sanity nightly run PR 47917
  • Add Generalize MoEGate module for WH PR 47705
  • Llama-3.2-3B bring-up PR 47349
  • Use ttnn.lerp instead of the explicit form PR 47868
  • Llama 3.2 1B bringup to TTTv2 PR 47342
  • ds_prefill - Combine operator changing layout of data processing to be expert-centric PR 47784
  • Performance & robustness pass on the UniAD TTNN port PR 45694
  • ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
  • Add generalized_moe_gate op on BH PR 47673
  • restrict D2H host-IO sweeps tests to Blackhole Galaxy PR 47973
  • feat: ROW_MAJOR + fp8_e4m3 support for update_padded_kv_cache PR 47536
  • GPT-OSS: migrate to transformers 5.10.2 (unpin model-specific version) PR 47671
  • Test Only: Fix scheduled tier-2/3 CI — Phi-3-mini, Mixtral-8x7B, Qwen2.5-VL, Llama-3.1-8B (transformers 5.x fallout + timeouts + perf target) PR 47802
  • DS Prefill: D2D socket-sync op PR 47918
  • Mask padded vocab logits in sampling PR 47021
  • Make Routed Expert Write Output in Dispatch Buffer PR 47627
  • perf(pcc): comp_pcc use torch.corrcoef instead of np.ma.corrcoef PR 46638
  • Fix decode_forward crash for row-sharded models (GPT-OSS) under vLLM async scheduling PR 48042
  • Batched prefill disabled for Llama 3.1 8B on P150 PR 47945
  • Fix MAX_GEN_LEN param for Whisper after transformers version upgrade PR 48154
  • ds_Prefill - CI timeout fix and failing tests skipping in order to make green CI PR 48068

Infrastructure & CI

  • Bug fix: Fix LLK asserts sanity nightly run PR 47917
  • per-device chunk_start from mesh coordinate (ring_joint_sdpa style) PR 47939
  • Simplify llk smoke CI PR 47920
  • ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
  • ci: fix code-analysis startup_failure on schedule/dispatch triggers [skip ci] PR 48013
  • Copilot review: suggest clearer PR titles (MINFRA-978) PR 47978
  • GPT-OSS: migrate to transformers 5.10.2 (unpin model-specific version) PR 47671
  • Test Only: Fix scheduled tier-2/3 CI — Phi-3-mini, Mixtral-8x7B, Qwen2.5-VL, Llama-3.1-8B (transformers 5.x fallout + timeouts + perf target) PR 47802
  • Comment sp4 tests PR 48050
  • Add --profile mode and machine-readable outputs to run_safe_pytest.sh PR 48039
  • upstream tests: tag upstream docker containers with release-latest and release- PR 47869
  • Add nightly cron to fabric cpu unit tests and add test scaffolding for merge gate PR 48001
  • ci: move blackhole sdxl test from bh-demos to shield-sdxl in L2 nightly PR 48070
  • Revmoed test from run_tg_frequent_tests.sh which was absorbed by the … PR 48146
  • fix(bundle-python): bundle CPython into _python/ subdir, fix pyvenv.cfg home PR 47997
  • Dispatch Telemetry Add Utilization Monitoring PR 46672
  • ds_Prefill - CI timeout fix and failing tests skipping in order to make green CI PR 48068
  • Fix bh_lb_DeepSeek_PREFILL_PERF timeout PR 48184

Other

  • Bug fix: Fix LLK asserts sanity nightly run PR 47917
  • Add causal GQA support for ring joint SDPA PR 47946