Release v0.74.0-dev20260626 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/28210418510

LLK (low-level kernels)

Simplify llk smoke CI PR 47920
Add generalized_moe_gate op on BH PR 47673
fix: eliminate deprecated implicit cast in wh sfpu reciprocal kernel PR 48022
ci: speed up ttsim llk test runs PR 48047
Use functional cast, not static_cast, optimize x^n PR 47436
Use sfpi::as not sfpi::reinterpret PR 48055
SFPU in parallel with FPU PR 47284
Feature: Quasar SFPU Compute API for Gelu PR 47860
Feature: Quasar SFPU Compute API for Tanh PR 47680

Metalium (tt-metal core)

Temporarily disable watcher NOC sanitization on Quasar in fast dispatch mode PR 47892
Bug fix: Fix profiler flush in dynamic noc mode PR 37642
SFPI 7.61.0 719 PR 47976
Fix int fpu datacopy MOP wrongly enabled on WH/BH in copy_tile_to_dst PR 47985
Performance & robustness pass on the UniAD TTNN port PR 45694
Add generalized_moe_gate op on BH PR 47673
Forbid DM-kernel DFB self-loops in ProgramSpec validation PR 48012
Automated UMD Bump 24.06.2026 PR 47948
Make DFB PACK finish_impl wait for UNPACK drain on INTRA self-loop PR 45158
Back run-arg Tables with a small-vector PR 48060
Feature: Quasar SFPU Compute API for Gelu PR 47860
Fix sharded layernorm JIT compile under x86 clang PR 47828
emule: add -I <src>/ttnn to JIT include flags PR 47678
Feature: Quasar SFPU Compute API for Tanh PR 47680
Dispatch Telemetry Add Utilization Monitoring PR 46672
#48160: adjust grid of quasar_32_arch.yaml PR 48162

TT-NN

Add dtype validation PR 47019
Bug fix: Fix LLK asserts sanity nightly run PR 47917
per-device chunk_start from mesh coordinate (ring_joint_sdpa style) PR 47939
Fix INT32 type conversion in tilize with padding PR 47977
ds_prefill - Combine operator changing layout of data processing to be expert-centric PR 47784
ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
restrict D2H host-IO sweeps tests to Blackhole Galaxy PR 47973
feat: ROW_MAJOR + fp8_e4m3 support for update_padded_kv_cache PR 47536
Add causal GQA support for ring joint SDPA PR 47946
matmul: fix batch broadcast when A batch=1 and B batch>1 PR 47640
conv1d: enable DRAM slicing by default, fix 4D weights & depthwise CB deadlock PR 47927
DS Prefill: D2D socket-sync op PR 47918
Make Routed Expert Write Output in Dispatch Buffer PR 47627
#23333: Enable skipped integer tests PR 48048
Bug Fix: Correct reduce scatter M non-determinism PR 47831
Add git repo url and commit hash to database PR 43830
Bug fix: Copying of mesh descriptor files to TTNN reports in multihost PR 48006
perf(groupnorm): reserve mask stride vector and hoist index math PR 48036

tt-train

Training service alignment (Cleanup) PR 43753
Make parallel RNG deterministic PR 47431

Models

Bug fix: Fix LLK asserts sanity nightly run PR 47917
Add Generalize MoEGate module for WH PR 47705
Llama-3.2-3B bring-up PR 47349
Use ttnn.lerp instead of the explicit form PR 47868
Llama 3.2 1B bringup to TTTv2 PR 47342
ds_prefill - Combine operator changing layout of data processing to be expert-centric PR 47784
Performance & robustness pass on the UniAD TTNN port PR 45694
ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
Add generalized_moe_gate op on BH PR 47673
restrict D2H host-IO sweeps tests to Blackhole Galaxy PR 47973
feat: ROW_MAJOR + fp8_e4m3 support for update_padded_kv_cache PR 47536
GPT-OSS: migrate to transformers 5.10.2 (unpin model-specific version) PR 47671
Test Only: Fix scheduled tier-2/3 CI — Phi-3-mini, Mixtral-8x7B, Qwen2.5-VL, Llama-3.1-8B (transformers 5.x fallout + timeouts + perf target) PR 47802
DS Prefill: D2D socket-sync op PR 47918
Mask padded vocab logits in sampling PR 47021
Make Routed Expert Write Output in Dispatch Buffer PR 47627
perf(pcc): comp_pcc use torch.corrcoef instead of np.ma.corrcoef PR 46638
Fix decode_forward crash for row-sharded models (GPT-OSS) under vLLM async scheduling PR 48042
Batched prefill disabled for Llama 3.1 8B on P150 PR 47945
Fix MAX_GEN_LEN param for Whisper after transformers version upgrade PR 48154
ds_Prefill - CI timeout fix and failing tests skipping in order to make green CI PR 48068

Infrastructure & CI

Bug fix: Fix LLK asserts sanity nightly run PR 47917
per-device chunk_start from mesh coordinate (ring_joint_sdpa style) PR 47939
Simplify llk smoke CI PR 47920
ds_prefill - Adding OP tests (except gate tests) for GLM 5.1, Minimax M2.7, Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash and GPT-OSS 120B PR 47325
ci: fix code-analysis startup_failure on schedule/dispatch triggers [skip ci] PR 48013
Copilot review: suggest clearer PR titles (MINFRA-978) PR 47978
GPT-OSS: migrate to transformers 5.10.2 (unpin model-specific version) PR 47671
Test Only: Fix scheduled tier-2/3 CI — Phi-3-mini, Mixtral-8x7B, Qwen2.5-VL, Llama-3.1-8B (transformers 5.x fallout + timeouts + perf target) PR 47802
Comment sp4 tests PR 48050
Add --profile mode and machine-readable outputs to run_safe_pytest.sh PR 48039
upstream tests: tag upstream docker containers with release-latest and release- PR 47869
Add nightly cron to fabric cpu unit tests and add test scaffolding for merge gate PR 48001
ci: move blackhole sdxl test from bh-demos to shield-sdxl in L2 nightly PR 48070
Revmoed test from run_tg_frequent_tests.sh which was absorbed by the … PR 48146
fix(bundle-python): bundle CPython into _python/ subdir, fix pyvenv.cfg home PR 47997
Dispatch Telemetry Add Utilization Monitoring PR 46672
ds_Prefill - CI timeout fix and failing tests skipping in order to make green CI PR 48068
Fix bh_lb_DeepSeek_PREFILL_PERF timeout PR 48184

Other

Bug fix: Fix LLK asserts sanity nightly run PR 47917
Add causal GQA support for ring joint SDPA PR 47946

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.74.0-dev20260626

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

LLK (low-level kernels)

Metalium (tt-metal core)

TT-NN

tt-train

Models

Infrastructure & CI

Other

Uh oh!