Skip to content

tenstorrent/tt-metal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tt-metal CI Ask DeepWiki

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.

Latest Releases

Release Release Date
0.59.0 ETA Jun 18, 2025
0.58.0 May 13, 2025
0.57.0 Apr 15, 2025
0.56.0 Mar 7, 2025

LLMs

Model Batch Hardware ttft (ms) t/s/u Target
t/s/u
t/s TT-Metalium Release vLLM Tenstorrent Repo Release
Qwen 3 32B (TP=8) 32 QuietBox 115 21.6 30 691.2 v0.59.0-rc42 09d8387
QwQ 32B (TP=8) 32 QuietBox 133 25.2 30 806.4 v0.56.0-rc51 e2e0002
DeepSeek R1 Distill Llama 3.3 70B (TP=8) 32 QuietBox 159 15.7 20 502.4 v0.59.0-rc42 09d8387
Llama 3.1 70B (TP=32) 32 Galaxy 105 56.0 80 1792.0 v0.59.0-rc40 09d8387
Llama 3.1 70B (TP=8) 32 QuietBox 159 15.7 20 502.4 v0.59.0-rc42 09d8387
Llama 3.2 11B Vision (TP=2) 16 n300 2550 15.8 17 252.8 v0.56.0-rc6 e2e0002
Qwen 2.5 7B (TP=2) 32 n300 126 32.5 38 1040.0 v0.56.0-rc33 e2e0002
Qwen 2.5 72B (TP=8) 32 QuietBox 333 14.5 20 464.0 v0.56.0-rc33 e2e0002
Falcon 7B 32 n150 70 18.3 26 585.6 v0.59.0-rc42
Falcon 7B (DP=8) 256 QuietBox 87 15.9 26 4070.4 v0.59.0-rc38
Falcon 7B (DP=32) 1024 Galaxy 125 12.5 26 12800.0 v0.59.0-rc38
Falcon 40B (TP=8) 32 QuietBox 11.9 36 380.8 v0.59.0-rc38
Llama 3.1 8B 32 p100 87* 26.5* 848.0* v0.59.0-rc3 739dcaa
Llama 3.1 8B 32 p150 69* 29.1* 931.2* v0.59.0-rc3 739dcaa
Llama 3.1 8B (DP=2) 64 2 x p150 64* 18.6* 1190.4* v0.59.0-rc3 739dcaa
Llama 3.1 8B 32 n150 104 24.6 23 787.2 v0.57.0-rc71 3f59287
Llama 3.2 1B 32 n150 23 67.6 160 2163.2 v0.57.0-rc23 f8b5b72
Llama 3.2 3B 32 n150 53 43.5 60 1392.0 v0.57.0-rc71 3f59287
Mamba 2.8B 32 n150 35 14.1 41 451.2 v0.59.0-rc38
Mistral 7B 32 n150 104 26.4 23 844.8 v0.59.0-rc13 739dcaa
Mixtral 8x7B (TP=8) 32 QuietBox 207 16.6 33 531.2 v0.59.0-rc38

Last Update: June 9, 2025

Notes:

  • ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.
  • Performance numbers were collected using the tt-metal model demos (accessible via the model links). If running with a vLLM inference server, performance may be different.
  • * Blackhole software optimization is under active development. Please join us in shaping the future of open source AI!
    [Discord] [Developer Hub]
  • For more information regarding vLLM installation and environment creation visit the Tenstorrent vLLM repository.

Speech-to-Text

Model Batch Hardware ttft (ms) t/s/u Target t/s/u t/s TT-Metalium Release
Whisper (distil-large-v3) 1 n150 239 56.0 45 56.0 v0.59.0-rc38

CNNs

Model Batch Hardware fps Target fps Release
ResNet-50 (224x224) 16 n150 4,700 7,000
ResNet-50 (224x224) (DP=2) 32 n300 9,200 14,000
ResNet-50 (224x224) (DP=8) 128 QuietBox 35,800 56,000
ResNet-50 (224x224) (DP=32) 512 Galaxy 96,800 224,000
ViT (224x224) 8 n150 1370 1,600
Stable Diffusion 1.4 (512x512) 1 n150 0.160 0.3
YOLOv4 (320x320) 1 n150 120 300
YOLOv4 (640x640) 1 n150 50 100
SegFormer Semantic Segmentation (512x512) 1 n150 90 300
Stable Diffusion 3.5 medium (512x512) 1 n150 0.06 0.3

Notes:

  • Stable Diffusion FPS is based on the time elapsed from submitting the input prompt to receiving the image from the VAE decoder.

NLPs

Model Batch Hardware sen/sec Target sen/sec Release
BERT-Large 8 n150 270 400

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Model Bring-Up and Testing

For information on initial model procedures, please see Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

Tools and Instruments

A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer overviews, operation flow graphs, and multi-instance support with file or SSH-based report loading. Install via pip or build from source:

pip install ttnn-visualizer

Tenstorrent Bounty Program Terms and Conditions

This repo is a part of Tenstorrent’s bounty program. If you are interested in helping to improve tt-metal, please make sure to read the Tenstorrent Bounty Program Terms and Conditions before heading to the issues tab. Look for the issues that are tagged with both “bounty” and difficulty level!