Installation
Via PyPI
pip install pjrt-plugin-tt==1.3.0 --extra-index-url https://pypi.eng.aws.tenstorrent.com/
pip install vllm-tt==1.3.0 --extra-index-url https://pypi.eng.aws.tenstorrent.com/
Via Docker
docker pull ghcr.io/tenstorrent/tt-xla-slim:1.3.0
What's Changed
- Uplift third_party/tt-mlir to 297f7eb6c0c07b5d3d16a9f5eec807cbe0edd4c1 2026-05-24 by @vmilosevic in #4843
- [vLLM plugin] Use [1, num_devices] shape for 1D mesh by @mmanzoorTT in #4591
- Add framework column and --framework filter to JUnit XML summarizer by @devisettymahidhar608 in #4845
- Uplift third_party/tt-mlir to b4871ad192c5783a58a09e1b0627d9cf1227c5f4 2026-05-25 by @vmilosevic in #4896
- Nightly Maintenance may24 by @devisettymahidhar608 in #4898
- Add kimi-k2.5 benchmark by @gengelageTT in #4802
- [vLLM]: skip wasted profile_run() in determine_available_memory by @kmabeeTT in #4893
- Add triage skill for bfloat16 dtype-mismatch FE failures by @agobeljicTT in #4574
- Uplift third_party/tt-mlir to 3ac5318a23224a280aa926b42b5bdcf11aefe12a 2026-05-25 by @vmilosevic in #4936
- Add benchmark report to On Nightly summary by @vkovacevicTT in #4831
- Enable group_norm composite (tt-metal#40916 fixed) by @kamalrajkannan78 in #4868
- Uplift third_party/tt_forge_models to a797c897100bddeb77d43fe61fd7f2746a7246c2 2026-05-25 by @vmilosevic in #4899
- decompositions: fix scales passed as exact in upsample_nearest callers by @Dev-X25874 in #4655
- [CI] Replace fetch job id action with param on perf benchmark by @nsumrakTT in #4938
- Uplift third_party/tt_forge_models to 8fb89846e04493da0f1dae0c656b59c4a50eddf3 2026-05-26 by @vmilosevic in #4945
- Uplift third_party/tt-mlir to c5f398432a61100da79b7b9f2941130496092287 2026-05-26 by @vmilosevic in #4941
- Add Mochi-1 component tests at original resolution (text encoder, DiT, VAE decoder) by @kamalrajkannan78 in #4641
- HiDream_I1: add sharded test for text_encoder_3 by @kamalrajkannan78 in #4848
- Infer device_type from CI inputs by @vkovacevicTT in #3721
- Update Forge version in inference server when releasing monthly by @vvukomanTT in #4962
- Set opt. lvl. to 1 for SDXL-Lightning, HiDreamI1 and Playground v2.5 VAE decoder tests by @kamalrajkannan78 in #4946
- [Cog VideoX 5B] Add initial tests for each part of the pipeline by @meenakshiramanathan1 in #4558
- [Hunyuan Video] Add initial tests for each part of the pipeline by @meenakshiramanathan1 in #4517
- Uplift third_party/tt_forge_models to 3b4e360baa9457776e9b97a1137f4ccabb70d3f9 2026-05-27 by @vmilosevic in #4964
- Add QB2 to weekly runs by @devisettymahidhar608 in #4937
- [OmiGen] Add initial tests for each part of the pipeline by @meenakshiramanathan1 in #4749
- [vLLM] Add perf benchmarks for Qwen3-Embedding-4B and BGE-m3 at batch_size=1 and 32 by @alinakhanTT in #4840
- Wire up D2M Fusion Option into LLM Benchmarks + Test GPT-OSS-20B with D2M Fusion Enabled by @brapananTT in #4534
- Add Deepseek V4 Flash E2E changes to nightly CI by @hshahTT in #4841
- Uplift Transformers to 5.5.1 by @ssaliceTT in #4272
- [Lumina Image] Add initial tests for each part of the pipeline by @meenakshiramanathan1 in #4775
- Update DeepSeek-OCR single-device test config for ~0.94 PCC by @ashokkumarkannan1 in #4951
- Remove model-specific install requirements from perf benchmarks by @odjuricicTT in #4939
- pjrt+vllm_plugin: expose dram_size_bytes; use it for KV cache sizing by @kmabeeTT in #4960
- Fix tiktoken pyreq for kimi benchmarks by @gengelageTT in #4989
- uplift torch_xla by @mstojkovicTT in #4979
- [CI] Add QB2-Blackhole TP benchmarks to nightly by @rpavlovicTT in #4983
- Update vLLM benchmark CODEOWNERS by @vkovacevicTT in #4984
- Add training mode and LoRA backward tests for LLM torch models by @agobeljicTT in #4219
- [Benchmarks] Remove hardcoded
arch report parameter by @vkovacevicTT in #4972
- Add Gemma4 e4b, 31b support for vLLM by @sshonTT in #4889
- [Composite] Add nn.RMSNorm module-form support by @kamalrajkannan78 in #4985
- [Playground v2.5] Add end-to-end pipeline example by @kamalrajkannan78 in #4992
- Reduce vLLM decode graphs from 5 to 2 by @alinakhanTT in #4789
- Add Gemma-4-31B-it to vLLM benchmarks on QB2 Blackhole by @kmabeeTT in #5012
- Lower gemma 1.1_7B_IT inference pcc threshold by @vzeljkovicTT in #5026
- [Tests] Xfail training PCC failures for phi1, phi1_lora, gemma_lora by @vzeljkovicTT in #5027
- Lower sdxl clip threshold nightly by @vzeljkovicTT in #5023
- Set targetModule path as a default for emitPy testing by @amilovanovicTT in #4819
- Add Qwen3-32B vLLM perf benchmark for QuietBox2 (batch 1) by @ssaliceTT in #5030
- [Benchmark] Fix multichip arch, perf regression check and qb2 transformers pin failures by @vkovacevicTT in #5020
- Add streaming inference for DeepSeek-V4-Flash by @sshonTT in #4811
- [vLLM] Improve test diagnostics by enabling basic logs by @mmanzoorTT in #4879
- Uplift third_party/tt_forge_models to 363958eba679bef0cf12fe6ed39e22e917048851 2026-06-02 by @vmilosevic in #5049
- Bump version to 1.3.0 by @vvukomanTT in #4991
- [wheel] Support bundling libtt-alchemist-lib.so into manylinux wheel by @svuckovicTT in #5050
- Set default kv cache dtype to bfp_bf8 by @kdimicTT in #4613
- [vLLM] Pin input shardings in the execution path to match warmup by @sshonTT in #5035
- assert_pcc=false for failing tests (phi1, phi1_lora, gemma_lora on p150) by @agobeljicTT in #5061
- [CI] Add nightly run for vLLM QB2 tests by @mmanzoorTT in #5055
- Uplift PJRT C API header from v0.106 to v0.110 by @acicovicTT in #4503
- Docs review skill by @acicovicTT in #4718
- [CI] Perf benchmark simplification by @vvukomanTT in #5078
- Uplift third_party/tt_forge_models to 6eadd3f27fa7819e6f6619484a61371d2fa44983 2026-06-04 by @vmilosevic in #5060
- Disable kv cache dtype conversion when MLA cache is used by @kdimicTT in #5082
- Fix JAX optimization_level not reaching PJRT plugin by @aorlovicTT in #4857
- Fix project name for vLLM perf tests by @mmanzoorTT in #5089
- Fix runtime errors encountered due to transformers uplift. by @devisettymahidhar608 in #5053
- Uplift third_party/tt_forge_models to 6b4b47a7c419cdc2713ddfc6e3179f61012c12f4 2026-06-05 by @vmilosevic in #5101
- [vLLM] Skip KV cache initialization by @mmanzoorTT in #5095
- [Benchmark] Run single-chip benchmark models through vllm benchmark by @vkovacevicTT in #5056
- [CI] Multihost CI integration by @nsumrakTT in #4971
- Add relative L2 error similarity metric to op-tests and benchmarks by @dgolubovicTT in #4676
- Add vLLM Llama-3.1 TP benchmarks (n300-llmbox) by @alinakhanTT in #5073
- [vLLM] Expose experimental_kv_cache_dtype + add xfailed BFP8 repro test by @kmabeeTT in #5007
- MoE backend from huggingface by @sshonTT in #4988
- Refactor Sliding Attention Overrides with Generic Model Rewrite Support by @devisettymahidhar608 in #4975
- [Perf tests] Fix benchmark TP config to avoid duplicate attributes by @mmanzoorTT in #5111
- Uplift third_party/tt_forge_models to 71584c597ec304999080596fccefea1becdd73f1 2026-06-07 by @vmilosevic in #5113
- [CI] Add option to run perf benchmark tests with custom torch-xla build by @mmanzoorTT in #4829
- Fix GPT-OSS 20B example segfault by selecting tt_dense experts backend by @devisettymahidhar608 in #5109
- [perf] Run single-chip benchmarks on stable qb2-blackhole via p150-perf label by @rpavlovicTT in #5103
- Move Training MoE Tests to QB2 by @pglusacTT in #5104
- [vLLM] Add support for Rotary Embedding with Multimodal Sections by @mmanzoorTT in #5107
- Add deepseek-v3.1 and glm4.7 benchmarks by @gengelageTT in #5097
- Add Janus-Pro T2I component bring-up tests by @ashokkumarkannan1 in #4810
- [CI] Add CPU only tests by @vvukomanTT in #5127
- Remove logical mapping of p150-perf -> qb2 by @vvukomanTT in #5126
- [GLM Image] Add initial tests for each part of the pipeline by @meenakshiramanathan1 in #4800
- Update code owner for vLLM by @mmanzoorTT in #5128
- [vLLM] Fix num_hidden_layers override for text-only models by @mmanzoorTT in #5125
- Add chisel context pytest fixture
--enable-chisel by @ndrakulicTT in #5025
- Fix rel_l2 comparison crash on rank-2 training tensors by @agobeljicTT in #5136
- Temporarily disable gpt_oss_20b_tp_d2m test from llm benchmarks by @brapananTT in #5135
- fix vllm perf's tokens / second value by @jazpurTT in #4953
- Add missed torch tests to CI by @gengelageTT in #5131
- Implement deferred transfer of host tensor data to multihost workers by @jameszianxuTT in #4617
- [CI] Re-enable t3k mutihost test by @nsumrakTT in #5143
- Don't install requirements for skipped tests by @sdjukicTT in #5138
- Uplift third_party/tt-mlir to 464a5f341908d5a3f790e697be5ffa6fd3fb6f32 2026-06-02 by @vmilosevic in #4957
- [Test] Add CPU compile-only tests by @jasonmacTT in #5033
- Fix main build break - Make host tensor shell represent strides as i64 by @jameszianxuTT in #5156
- [vLLM] batch paged_fill_cache across users in prefill (compile/perf improvement) by @kmabeeTT in #4955
- Update scatter add tests based on tt-mlir PR by @ddilbazTT in #5028
- [Nightly] Fix BH galaxy nightly failure by @sshonTT in #5094
- [Claude] Skill for model sharding by @vkovinicTT in #5134
- [CI] Check if (torch) test is added without device marker by @nsumrakTT in #5182
- [Benchmark] Add custom sharding for glm_4_7 by @mvasiljevicTT in #5139
- Ignore runtime-generated debug artifacts under generated/ by @mmanzoorTT in #5192
- Set
--no-rosegment linker flag in manylinux builds to mitigate patchelf corruption of binaries by @acicovicTT in #5181
- [pjrt] Route non-contiguous host buffers through the owned-tensor path by @mstojkovicTT in #5059
- [CI] Use manylinux wheel for testing on push event by @nsumrakTT in #5193
- Wan5b Tests by @ppadjinTT in #4965
- Remove xfail from mistral test for vllm by @sshonTT in #5194
- Remove optimizer submesh device; let optimizer use mock device by @rpavlovicTT in #5146
- [vLLM] Remove xfails for fixed bugs (#4570, #5006); re-point logprobs xfail by @kmabeeTT in #5186
- [Playground v2.5] Wire e2e pipeline into nightly and benchmark CI by @kamalrajkannan78 in #5044
- Wan14b tests by @ppadjinTT in #5036
- Uplift third_party/tt_forge_models to 09239ae98eb4f0b03abe5240aca9418eb3131717 2026-06-13 by @vmilosevic in #5133
- Add a claude skill to compare failures between 2 nightlies by @ctr-pmuruganTT in #4358
- Uplift third_party/tt_forge_models to b27be2665e4f0b7773f5addd1879c2d90f77ce51 2026-06-15 by @vmilosevic in #5200
- [CI] Fix on push multihost test fail by @nsumrakTT in #5220
- [CI] Fix wheel build selector by @nsumrakTT in #5221
- Remove some n300-llmbox models from benchmark CI by @vkovacevicTT in #5226
- Clamp out-of-range negative aten.slice starts to -dim_size (#5199) by @kamalrajkannan78 in #5211
- Uplift third_party/tt_forge_models to 2fa8c5686d64d51ec4a8d30e21cde86a3d776bf3 2026-06-16 by @vmilosevic in #5231
- Add Lazy Execution Option in Legacy Compile Path by @pglusacTT in #5235
- [CI] Fix p150-perf shared runner name by @nsumrakTT in #5243
- [CI] Replace custom job_id action with built-in job.check_run_id by @vmilosevic in #5253
- Implementation of PJRT_Client_CreateUninitializedBuffer by @acicovicTT in #5080
- Bring back gpt_oss_20b_tp_batch_size_1 and qwen_2_5_coder_32b_instruct_tp to benchmark CI by @vkovacevicTT in #5239
- Uplift third_party/tt_forge_models to 0d3ee26c4e8be082876ff6fb04f65dc33e96189f 2026-06-17 by @vmilosevic in #5252
- Fix inflated number of devices when no tensors are sharded by @acicovicTT in #5236
- [Benchmark] Fix GLM4.7 and Deepseek-v3.1 MoE shard specs by @gengelageTT in #5240
- Updates Config for gemma4-12B model by @saiarthiraguram in #5258
- [Test] Blackhole galaxy simple test by @vkovinicTT in #5261
- [vLLM] Add Gemma 4-31B Blackhole Galaxy test through vLLM by @ddilbazTT in #5224
- [FX Fusing] Add RMSNorm fusion patterns for Llama, GPT-OSS, Gemma family, and vLLM by @alinakhanTT in #5140
- Adds sliding-window attention support for Gemma3 multimodal models by @devisettymahidhar608 in #5115
- [Test] Mark sana/1600M_1024px inference EXPECTED_PASSING by @saiarthiraguram in #5188
- [EmitPy] Add regression test for TP/EP when export_tensors=True by @amilovanovicTT in #5219
- Uplift third_party/tt_forge_models to 79ef7852d1d1c664917a6984a0d4c527825a6142 2026-06-18 by @vmilosevic in #5274
- [SDXL Lightning] Add e2e pipeline in nightly and benchmark CI by @kamalrajkannan78 in #5244
- [vLLM] Warmup phase optimization by @mmanzoorTT in #5129
- [vLLM] Fix batch-32 TP benchmark failure; set all TP tests to bs32 by @ssaliceTT in #5159
- Add generality models to tensor parallel inference test config by @devisettymahidhar608 in #5201
- Skip gpt_oss_20b_tp perf test on n300-llmbox by @vmilosevic in #5287
- move cpu_compile_only to experimental nightly by @jameszianxuTT in #5268
- Uplift third_party/tt_forge_models to 2fd9c86262aa7059a2152af43c7286d41f7a3edf 2026-06-19 by @vmilosevic in #5286
- Add triage skill for missing-input FE failures by @agobeljicTT in #4611
- Uplift third_party/tt-mlir to 70ff200c7d2fa8d8401f316a0a0b35ee88cbfb72 2026-06-18 by @vmilosevic in #5164
- [Qol] Set a default controller hostname when env var not set by @jasonmacTT in #5293
- vLLM Falcon3-7B removing num_hidden_layers config from test by @ssaliceTT in #5301
- Uplift third_party/tt_forge_models to 32d5c2e4a8cfd55b0f2ec99b3ec8d1b217fcb742 2026-06-20 by @vmilosevic in #5304
- vllm: Skip extract_nodes_info unless XLA_HLO_DEBUG=1 for compile time speedup by @kmabeeTT in #5299
- Fix runner label mapping logic in perf tests by @vvukomanTT in #5288
- Reduce DeepSeek-V4 e2e PCC test to 10 layers on BH Galaxy by @sshonTT in #5296
- Uplift third_party/tt_forge_models to 6400d1eb60ba1ca2f7ea37f8c3e613a2d744c301 2026-06-22 by @vmilosevic in #5305
- Add dependencies commits to release notes by @vvukomanTT in #5311
- [WAN 2.2] Path for sp DiT sharding by @vkovinicTT in #5309
- Fix TT_RUNTIME_DEBUG compile-time variable propagation to PJRT callers by @jameszianxuTT in #5242
- Expand perf reports to include p150 benchmarks by @vvukomanTT in #5318
- [vLLM] Fix runner to use correct sampling graph for cpu sampling by @mmanzoorTT in #5316
- Bring up Mixtral models in the vLLM plugin. by @devisettymahidhar608 in #3523
- Update test config set2 by @devisettymahidhar608 in #5320
- Bring up the Pixtral model in the vLLM plugin by @devisettymahidhar608 in #3996
- Fix libtt-alchemist-lib.so bundling into manylinux wheel by @amilovanovicTT in #5333
- Add Gemma-4 26B-A4B MoE support for vLLM on 2D mesh by @sshonTT in #5141
- [pjrt] fix prepare inputs for codegen by @pilkicTT in #5322
- [vLLM] Pin FastAPI in deps to avoid route-tree regression by @mmanzoorTT in #5335
- [vLLM] Set default optimization level to 1 by @mmanzoorTT in #5327
New Contributors
Full Changelog: 1.2.0...1.3.0
LLM Performance
| Model |
Token/sec/user |
Batch |
Token/sec |
ttft (ms) |
Hardware |
| pytorch_DeepSeek-V3.1_deepseek_v3_1_modified_nlp_causal_lm_custom |
3.0 |
64 |
192.0 |
4159.47 |
n150 |
| pytorch_Falcon_3_1B_Base_nlp_causal_lm_huggingface |
57.0 |
32 |
1824.0 |
663.53 |
n150 |
| pytorch_Falcon_3_3B_Base_nlp_causal_lm_huggingface |
38.0 |
32 |
1216.0 |
865.52 |
n150 |
| pytorch_Falcon_3_7B_Base_nlp_causal_lm_huggingface |
19.0 |
32 |
608.0 |
1186.81 |
n150 |
| pytorch_GLM_4.7_nlp_causal_lm_huggingface |
7.0 |
64 |
448.0 |
1590.81 |
n150 |
| pytorch_Gemma_1.1_2B_IT_nlp_causal_lm_huggingface |
40.0 |
32 |
1280.0 |
638.1 |
n150 |
| pytorch_Kimi-K2.5_kimi_k2_5_modified_nlp_causal_lm_custom |
3.0 |
64 |
192.0 |
4587.63 |
n150 |
| pytorch_Kimi-K2_kimi_k2_instruct_modified_nlp_causal_lm_custom |
3.0 |
64 |
192.0 |
4808.17 |
n150 |
| pytorch_Llama_3.1_8B_Instruct_nlp_causal_lm_huggingface |
23.0 |
32 |
736.0 |
2230.65 |
n150 |
| pytorch_Llama_3.2_1B_Instruct_nlp_causal_lm_huggingface |
67.0 |
32 |
2144.0 |
575.95 |
n150 |
| pytorch_Llama_3.2_3B_Instruct_nlp_causal_lm_huggingface |
31.0 |
32 |
992.0 |
605.5 |
n150 |
| pytorch_Mistral_7B_INSTRUCT_v03_nlp_causal_lm_huggingface |
20.0 |
32 |
640.0 |
1252.08 |
n150 |
| pytorch_Mistral_Ministral_8B_Instruct_nlp_causal_lm_huggingface |
12.0 |
32 |
384.0 |
550.28 |
n150 |
| pytorch_Mistral_Small_24B_INSTRUCT_2501_nlp_causal_lm_huggingface |
16.0 |
32 |
512.0 |
1790.87 |
n150 |
| pytorch_Phi-1.5_Phi_1_5_nlp_causal_lm_huggingface |
22.0 |
32 |
704.0 |
617.58 |
n150 |
| pytorch_Phi-1_Phi_1_nlp_causal_lm_huggingface |
22.0 |
32 |
704.0 |
632.51 |
n150 |
| pytorch_Qwen 2.5_0.5B_Instruct_nlp_causal_lm_huggingface |
78.0 |
32 |
2496.0 |
408.36 |
n150 |
| pytorch_Qwen 2.5_1.5B_Instruct_nlp_causal_lm_huggingface |
38.0 |
32 |
1216.0 |
461.04 |
n150 |
| pytorch_Qwen 2.5_3B_Instruct_nlp_causal_lm_huggingface |
31.0 |
32 |
992.0 |
697.85 |
n150 |
| pytorch_Qwen 2.5_7B_Instruct_nlp_causal_lm_huggingface |
16.0 |
32 |
512.0 |
859.7 |
n150 |
| pytorch_Qwen 3_0_6B_nlp_causal_lm_huggingface |
36.0 |
32 |
1152.0 |
1163.37 |
n150 |
| pytorch_Qwen 3_1_7B_nlp_causal_lm_huggingface |
30.0 |
32 |
960.0 |
746.03 |
n150 |
| pytorch_Qwen 3_4B_nlp_causal_lm_huggingface |
17.0 |
32 |
544.0 |
976.23 |
n150 |
| pytorch_Qwen 3_8B_nlp_causal_lm_huggingface |
13.0 |
32 |
416.0 |
1673.32 |
n150 |
Non-LLM Performance
| Model |
Batch |
Sample/sec |
Hardware |
| pytorch_BERT_emrecan/bert-base-turkish-cased-mean-nli-stsb-tr_nlp_embed_gen_huggingface |
8 |
44.0 |
n150 |
| pytorch_BGE-M3_Base_nlp_embed_gen_custom |
4 |
9.0 |
n150 |
| pytorch_BGE-M3_Base_nlp_embed_gen_custom |
4 |
19.0 |
p150 |
| pytorch_EfficientNet_Timm_B0_cv_image_cls_timm |
8 |
349.0 |
n150 |
| pytorch_MNIST_Cnn_Dropout_cv_image_cls_custom |
32 |
14643.0 |
n150 |
| pytorch_MobileNetV2_Mobilenet_v2_cv_image_cls_torch_hub |
12 |
1237.0 |
n150 |
| pytorch_Qwen 3_Embedding_4B_nlp_embed_gen_huggingface |
32 |
46.0 |
n150 |
| pytorch_ResNet_ResNet50_HuggingFace_cv_image_cls_huggingface |
8 |
1339.0 |
n150 |
| pytorch_SegFormer_B0_Finetuned_Ade_512_512_cv_image_seg_huggingface |
1 |
39.0 |
n150 |
| pytorch_Swin_S_cv_image_cls_torchvision |
1 |
10.0 |
n150 |
| pytorch_U-Net for Conditional Generation_Base_conditional_generation_huggingface |
1 |
5.0 |
n150 |
| pytorch_Ultra-Fast Lane Detection v2_TuSimple_ResNet34_Backbone_cv_image_seg_github |
1 |
136.0 |
n150 |
| pytorch_VGG19-UNet_base_cv_image_seg_custom |
1 |
147.0 |
n150 |
| pytorch_ViT_Base_cv_image_cls_huggingface |
8 |
229.0 |
n150 |
| pytorch_VoVNet_Ese_Vovnet19b_Dw.ra_In1k_cv_image_cls_timm |
8 |
667.0 |
n150 |
Model coverage
Info: Full list of supported models is available in the assets section.
| Model task |
Model architecture |
Model variant |
Model framework |
Inference |
Training |
n150 |
n300 |
p150 |
Single device |
Data parallel |
Tensor parallel |
Model source |
| conditional generation |
U-Net for Conditional Generation |
Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
AlexNet |
Custom 1x2 |
jax |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| cv image cls |
DINOv2 |
Small |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
EfficientNet |
B0 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Batchnorm |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Dropout |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Dropout |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Nodropout |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom |
jax |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom 1x2 |
jax |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| cv image cls |
MobileNetV1 |
Mobilenet v1 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MobileNetV2 |
Mobilenet v2 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
ResNet |
ResNet50 HuggingFace High Resolution |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
SegFormer |
Mit B0 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
Swin |
S |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
VGG |
HF Vgg19 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
ViT |
Base |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
VoVNet |
Ese Vovnet19b Dw.ra In1k |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image seg |
Ultra-Fast Lane Detection |
TuSimple ResNet18 Backbone |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image seg |
VGG19-UNet |
base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv img to img |
Autoencoder |
linear |
pytorch |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
Attention DenseUNet |
Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
DETR |
ResNet50 Backbone |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
OWL-ViT |
Base Patch32 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
PointPillars |
pointpillars |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOP |
Default |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOS Small |
Small |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOv4 |
Base |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv object det |
YOLOv7 |
Default |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOv9 |
T |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
ssd512 |
ssd512 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm action prediction |
OpenVLA-OFT |
Finetuned Libero 10 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm action prediction |
pi_0 |
pi0 base |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm image text similarity |
CLIP |
Base Patch16 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm image text similarity |
SigLIP |
Base Patch16 224 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm visual qa |
Llama |
3.2 11B Vision Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm visual qa |
Mistral |
base |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
ALLaM |
7B Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Command_A_Reasoning |
command-a-reasoning-08-2025 |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Falcon |
3 10B Base |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Falcon |
3 1B Base |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Falcon |
3 3B Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Falcon |
3 7B Base |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
GPT-2 |
Base |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
GPT-2 |
Xl |
jax |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
GPT-OSS |
20B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
1.1 2B IT |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Gemma |
1.1 7B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
2 27B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
2 2B IT |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Gemma |
2 9B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.1 70B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.1 8B Instruct |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.2 1B |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Llama |
3.2 3B |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Llama |
3.3 70B Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
7B INSTRUCT v03 |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Devstral Small 2505 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Large INSTRUCT 2411 |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Magistral Small 2506 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Ministral 8B Instruct |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Nemo INSTRUCT 2407 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Small 24B INSTRUCT 2501 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Phi-1 |
Phi 1 |
jax |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Phi-1 |
Phi 1 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1 |
Phi 1 |
pytorch |
❌ |
✅ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1 LoRA |
Phi 1 |
pytorch |
❌ |
✅ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1.5 |
Phi 1 5 |
jax |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Phi-1.5 |
Phi 1 5 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-2 |
Phi 2 |
jax |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Phi-2 |
Phi 2 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini 128K Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini 4K Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-4 |
Phi 4 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Qwen 2 |
Qwq 32B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B |
jax |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B Instruct |
jax |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |