Installation
Via PyPI
pip install pjrt-plugin-tt==1.2.0 --extra-index-url https://pypi.eng.aws.tenstorrent.com/
pip install vllm-tt==1.2.0 --extra-index-url https://pypi.eng.aws.tenstorrent.com/
Via Docker
docker pull ghcr.io/tenstorrent/tt-xla-slim:1.2.0
What's Changed
- Uplift PJRT C API header from v0.103 to v0.106 by @ajakovljevicTT in #4313
- Reland EmitPy testing by @sgligorijevicTT in #4261
- Uplift third_party/tt_forge_models to cae9ccbc67a318736f656bee9a9ea776eb73e69c 2026-04-28 by @acicovicTT in #4423
- XFail sdpa_decode affected tests by @acicovicTT in #4425
- Nightly maintenance by @sgligorijevicTT in #4426
- Disable kimi_k2 in CI by @vkovacevicTT in #4432
- [vLLM plugin] Fix 1D mesh shape and tensor sharding for tensor parallelism by @mmanzoorTT in #4421
- Uplift third_party/tt-mlir to 5b85073695682d062a0ac7fe5888bfb5b410853d 2026-04-28 by @acicovicTT in #4416
- Lower PCC threshold for siglip variant achieving PCC greater than 0.98 in N150 by @meenakshiramanathan1 in #4415
- Removing monkeypatch for jax.random.uniform by @aorlovicTT in #4427
- Remove duplicate umd shared lib files by @vvukomanTT in #4451
- Handle PCC errors in benchmark by @vkovacevicTT in #4438
- Manual tools test workflow by @acicovicTT in #4428
- Codegen multiple dirs on graph breaks instead of clobbering by @svuckovicTT in #4429
- Simplify call-test.yml by moving conditions into test matrix config by @vmilosevic in #4456
- Add initial support for composite torch.gather by @hshahTT in #4431
- update analyze-nightly skill to be reused by other skills by @ctr-pmuruganTT in #4448
- [vLLM plugin] Fix 2D-mesh device-sampler garbage output (and add coherence check) by @mmanzoorTT in #4442
- Compile only improved by @AleksKnezevic in #4271
- Make updating release notes robust by @vvukomanTT in #4501
- Create large llm test matrix, move a llama to large matrix by @sgligorijevicTT in #4454
- [vLLM plugin] vllm v0.19.0 uplift by @mmanzoorTT in #4443
- Update PyTorch tag in torch-xla build script by @mmanzoorTT in #4492
- Uplift third_party/tt-mlir to f8d3bf0e97dee04ea1783b00304b37b48d446c62 2026-05-04 by @acicovicTT in #4450
- Weekly Maintance May2 by @devisettymahidhar608 in #4497
- Nightly Maintance May1 by @devisettymahidhar608 in #4498
- Remove known failure entries for fixed training tests by @agobeljicTT in #4512
- [vLLM] Replace sort-based sampling with multi-core topk for 2x non-greedy speedup by @kmabeeTT in #4334
- Add accuracy regression check by @vvukomanTT in #4522
- Prioritize runtime_reason over static_reason in record_model_test_properties by @agobeljicTT in #4524
- Revert "[vLLM plugin] vllm v0.19.0 uplift" by @mmanzoorTT in #4516
- Uplift third_party/tt_forge_models to eba69819e9d4e5b7bd3c818656120d2c09b1a679 2026-05-05 by @acicovicTT in #4502
- [CI] Fix uplift automerge by @nsumrakTT in #4533
- Add passing p150 vLLM single_device tests to basic-test matrices by @kmabeeTT in #4515
- [vLLM plugin] Cleanly shut down after a failed test to prevent device hang by @mmanzoorTT in #4511
- Fix mesh shape when the graph has no inputs by @pglusacTT in #4439
- All-to-All Dispatch and Combine Backward by @pglusacTT in #4386
- [Krea Realtime 14B] Add initial tests for each part of the pipeline by @kamalrajkannan78 in #4513
- Add KV cache dtype conversion option by @kdimicTT in #4140
- Update status on perceiverio models by @saiarthiraguram in #4555
- Add
finding-missed-fusions skill by @ppadjinTT in #4545
- Add claude skill for benchmark reports by @vkovacevicTT in #4514
- Default batch_size=2 for training in test_all_models_torch by @agobeljicTT in #4535
- Uplift third_party/tt-mlir to bb1deb417cef0e2c60147072cf7b6926a49ccca7 2026-05-06 by @acicovicTT in #4509
- [Benchmark] Add check for fused ops in benchmarks by @vkovacevicTT in #4434
- [vLLM] Fix flaky output coherence assertion in TP generation test by @kmabeeTT in #4563
- Fix tt-triage install by @vvukomanTT in #4553
- Uplift third_party/tt_forge_models to 9c743461b7fe91bd33683f66c9a456f0a22e1634 2026-05-07 by @acicovicTT in #4544
- Add
--force-run option to pytest for debugging skipped tests by @agobeljicTT in #4572
- Handle ComplexType in simplifyMainFuncOp zero-attr creation by @kamalrajkannan78 in #4556
- [Benchmark] Bring back gpt_oss_120b in accuracy tests by @vkovacevicTT in #4530
- Fix masked_scatter decomposition to resolve OOM error in gemma3 multimodal models by @sonalibaskaran2499 in #4315
- sparse_mlp: pad batch to tile-align tokens for bsz<32 support by @sshonTT in #4537
- PyTorch and vLLM uplift by @mmanzoorTT in #4543
- Update the status of Vilt model in inference test config by @kamalrajkannan78 in #4576
- Lower PCC thresholds after nightly run 25529490187 by @vzeljkovicTT in #4586
- Allow overriding tt-mlir and tt-metal source dirs in third_party build by @nsmithtt in #4528
- Allow TT_RUNTIME_USING_DUALT3K to force fabric 2D init in non-distributed context by @jameszianxuTT in #4419
- [vLLM plugin] vLLM 0.19.1 uplift by @mmanzoorTT in #4588
- Add 4 layer deepseek and glm tests with weight caching by @gengelageTT in #4538
- Uplift third_party/tt_forge_models to f224af305a10d38acb9fbd72c0c3514b26ec4544 2026-05-09 by @acicovicTT in #4598
- [HunyuanVideo-1.5-Diffusers-480p_t2v_distilled] Add initial tests for each part of the pipeline by @kamalrajkannan78 in #4531
- Add model-test-emitpy preset to manual-test.yml by @svuckovicTT in #4605
- Add support for new ForgePrefillModel class in testing infra by @umalesTT in #4390
- Add llama 3.1 70b and gpt oss 120b dp by @vvukomanTT in #4606
- Testing models using sliding attention by @devisettymahidhar608 in #4287
- Nightly Maintance may9 by @devisettymahidhar608 in #4601
- Weekly maintance may9 by @devisettymahidhar608 in #4599
- Don't download weights in GPT OSS training test by @pglusacTT in #4610
- xfail olmo and mistral_8b examples added in #4287 by @jameszianxuTT in #4625
- [vLLM plugin] Enable TP pooling test for intfloat/e5-mistral-7b-instruct by @mmanzoorTT in #4612
- [vLLM + Benchmark] 3 perf improvements: ttnn.sampling fused op, pad-batch-to-32, skip greedy on all_random by @kmabeeTT in #4536
- Add deepseek-v3.2-exp benchmark by @gengelageTT in #4557
- Add kimi-k2 benchmark back into CI by @gengelageTT in #4566
- Switch accuracy metrics to quantile-based (p5) + mean over full batch by @dgolubovicTT in #4362
- Add galaxy-wh-6u vLLM support with Mistral-Large model test by @devisettymahidhar608 in #3814
- Fix RMS norm batch parallel test by @acicovicTT in #4637
- [pjrt] expose const-eval-inputs-to-system-memory pipeline option by @sshonTT in #4593
- Uplift third_party/tt_forge_models to 7477c75d5b02f21cefb28def2f8023260fe1bb09 2026-05-12 by @acicovicTT in #4634
- sparse_mlp: add DeepSeek-V4 MoE support by @sshonTT in #4622
- Add retry on release notes update by @vvukomanTT in #4649
- Uplift third_party/tt_forge_models to 7c4207b180592babb6f472764fec3bfc99577118 2026-05-13 by @vmilosevic in #4658
- [HunyuanVideo-1.5-Diffusers-480p_t2v_distilled] Add tiled VAE decoder test by @kamalrajkannan78 in #4621
- Update Mistral model config and bump torch version in CLAUDE.md by @devisettymahidhar608 in #4656
- Nightly) Test DeepSeek-V3.2-Exp MoE block with real HF weights by @sshonTT in #4680
- Uplift third_party/tt_forge_models to 93218a34fc9fc6a671e0e41101da470c80891b2a 2026-05-14 by @vmilosevic in #4684
- Add triage skill for unpack_forward_output FE failures by @agobeljicTT in #4549
- Update transfuser config based on latest main by @saiarthiraguram in #4700
- add configs for openlem jax model by @ctr-pmuruganTT in #4666
- add dump_irs option to upload pytest IR artifacts by @ndrakulicTT in #4506
- Align vllm_benchmark with llm_benchmark and add opt-125m tests by @alinakhanTT in #4654
- Uplift third_party/tt-mlir to eb9005fa360a80e44607e2dfd4404137b510092e 2026-05-14 by @acicovicTT in #4569
- Add required runtime debug tools to tt-xla explorer wheel by @nsumrakTT in #4385
- Add deepseek v3.2 prefill and indexer tests to nightly by @gengelageTT in #4669
- [pjrt] release host source after layout migration by @sshonTT in #4594
- Uplift third_party/tt_forge_models to a64a98131c35b010895198f489355d0e6306934f 2026-05-15 by @vmilosevic in #4715
- Update test config statuses for YOLOv9, YOLOS, and OLM OCR by @agobeljicTT in #4723
- Update the inference test config of d_fine variants based on latest main results by @kamalrajkannan78 in #4717
- [Benchmark] Make sure decode PCC comparison uses same input as golden by @odjuricicTT in #4661
- [vLLM tests] Fix INTERNALERROR in cleanup hook; relax
test_seed_mixed_batch xfail to strict=False (onPR Flaky Fail) by @kmabeeTT in #4736
- [Composite OPs] Register autograd for
xla::mark_tensor by @umalesTT in #4731
- Enable optimization level configuration for tests in test infra by @acicovicTT in #4636
- Migrate tt-xla docs to the sphinx backend by @acicovicTT in #4686
- Set opt. lvl. in fusion tests for concatenate_heads to 1 by @acicovicTT in #4763
- [PJRT] stop clearing program cache on executable destroy by @mstojkovicTT in #4734
- Apply torch.manual_seed fixture to benchmark tests by @gengelageTT in #4713
- [CI] Pull sfpi for local builds by @nsumrakTT in #4769
- Fix: correct image path in README by @devisettymahidhar608 in #4770
- Enable ttmlir python tools in debug build by @ndrakulicTT in #4776
- Nightly Maintance may16 by @devisettymahidhar608 in #4739
- Uplift third_party/tt_forge_models to 6519407b21b991539aa75880f5b9333c80475991 2026-05-20 by @vmilosevic in #4793
- Adds config for motif model by @saiarthiraguram in #4745
- Update the inference test config of panoptic segmentation variants w.r.t current main by @kamalrajkannan78 in #4795
- Nightly Maintance may19 by @devisettymahidhar608 in #4796
- Add Playground v2.5 component tests by @kamalrajkannan78 in #4711
- Add HiDream-I1-Fast component tests by @kamalrajkannan78 in #4759
- Add SDXL-Lightning component tests by @kamalrajkannan78 in #4730
- Add working vLLM benchmarks for single-chip and TP models to CI by @alinakhanTT in #4732
- [vLLM plugin] Unify pooling runner input layout to [num_reqs, tokens] by @mmanzoorTT in #4673
- Fix off by 1 in accuracy benchmark loop by @odjuricicTT in #4807
- [CI] Move p150 perf benchmarks from experimental to main nightly by @rpavlovicTT in #4808
- Add scatter add tests for ttir.embedding_backward coverage by @ddilbazTT in #4826
- Update Pi0, GR00T, and DeepSeek OCR inference configs by @ashokkumarkannan1 in #4683
- Uplift third_party/tt-mlir to 2bd67018499ffa0ed9a0aee507325d75a8e46b84 2026-05-21 by @vmilosevic in #4716
- Uplift third_party/tt_forge_models to f96d6a82a01cb2fe2133d45431b2a6620fc7c792 2026-05-21 by @vmilosevic in #4817
- Bump opt. lvl. for Qwen 3 4B to 1 by @acicovicTT in #4867
- Add HunyuanImage 2.1 component tests (text_encoder, text_encoder_2, transformer, vae) by @kamalrajkannan78 in #4782
- Update kimi-k2 mla cache test to use tt-forge-models by @gengelageTT in #4830
- Add LLMBox Deepseek v4 tests by @hshahTT in #4743
- Uplift third_party/tt_forge_models to 7201811e7020d0e35e908df47a9e57926ba0aa1c 2026-05-23 by @vmilosevic in #4882
Full Changelog: 1.1.0...1.2.0
LLM Performance
| Model |
Token/sec/user |
Batch |
Token/sec |
ttft (ms) |
| facebook/opt-125m |
6.0 |
1 |
6.0 |
175.07 |
| pytorch_Falcon_3_1B_Base_nlp_causal_lm_huggingface |
57.0 |
32 |
1824.0 |
281.39 |
| pytorch_Falcon_3_3B_Base_nlp_causal_lm_huggingface |
37.0 |
32 |
1184.0 |
385.3 |
| pytorch_Gemma_1.1_2B_IT_nlp_causal_lm_huggingface |
40.0 |
32 |
1280.0 |
428.1 |
| pytorch_Llama_3.1_8B_Instruct_nlp_causal_lm_huggingface |
22.0 |
32 |
704.0 |
655.24 |
| pytorch_Llama_3.2_1B_Instruct_nlp_causal_lm_huggingface |
68.0 |
32 |
2176.0 |
248.85 |
| pytorch_Mistral_7B_INSTRUCT_v03_nlp_causal_lm_huggingface |
20.0 |
32 |
640.0 |
638.69 |
| pytorch_Mistral_Ministral_8B_Instruct_nlp_causal_lm_huggingface |
12.0 |
32 |
384.0 |
304.81 |
| pytorch_Phi-1.5_Phi_1_5_nlp_causal_lm_huggingface |
24.0 |
32 |
768.0 |
462.4 |
| pytorch_Phi-1_Phi_1_nlp_causal_lm_huggingface |
24.0 |
32 |
768.0 |
457.77 |
| pytorch_Phi-2_Phi_2_nlp_causal_lm_huggingface |
11.0 |
32 |
352.0 |
1002.6 |
| pytorch_Qwen 2.5_0.5B_Instruct_nlp_causal_lm_huggingface |
81.0 |
32 |
2592.0 |
286.17 |
| pytorch_Qwen 2.5_1.5B_Instruct_nlp_causal_lm_huggingface |
39.0 |
32 |
1248.0 |
350.69 |
| pytorch_Qwen 2.5_3B_Instruct_nlp_causal_lm_huggingface |
33.0 |
32 |
1056.0 |
531.72 |
| pytorch_Qwen 2.5_7B_Instruct_nlp_causal_lm_huggingface |
16.0 |
32 |
512.0 |
759.47 |
| pytorch_Qwen 3_0_6B_nlp_causal_lm_huggingface |
36.0 |
32 |
1152.0 |
451.46 |
| pytorch_Qwen 3_1_7B_nlp_causal_lm_huggingface |
30.0 |
32 |
960.0 |
490.97 |
| pytorch_Qwen 3_4B_nlp_causal_lm_huggingface |
18.0 |
32 |
576.0 |
683.49 |
| pytorch_Qwen 3_8B_nlp_causal_lm_huggingface |
13.0 |
32 |
416.0 |
806.92 |
| tiiuae/Falcon3-1B-Base |
32.0 |
1 |
32.0 |
50.67 |
Non-LLM Performance
| Model |
Batch |
Sample/sec |
| pytorch_BERT_emrecan/bert-base-turkish-cased-mean-nli-stsb-tr_nlp_embed_gen_huggingface |
8 |
44.0 |
| pytorch_BGE-M3_Base_nlp_embed_gen_custom |
4 |
9.0 |
| pytorch_EfficientNet_Timm_B0_cv_image_cls_timm |
8 |
332.0 |
| pytorch_MNIST_Cnn_Dropout_cv_image_cls_custom |
32 |
14688.0 |
| pytorch_MobileNetV2_Mobilenet_v2_cv_image_cls_torch_hub |
12 |
1252.0 |
| pytorch_Qwen 3_Embedding_4B_nlp_embed_gen_huggingface |
32 |
46.0 |
| pytorch_ResNet_ResNet50_HuggingFace_cv_image_cls_huggingface |
8 |
1353.0 |
| pytorch_SegFormer_B0_Finetuned_Ade_512_512_cv_image_seg_huggingface |
1 |
38.0 |
| pytorch_Swin_S_cv_image_cls_torchvision |
1 |
9.0 |
| pytorch_U-Net for Conditional Generation_Base_conditional_generation_huggingface |
1 |
3.0 |
| pytorch_Ultra-Fast Lane Detection v2_TuSimple_ResNet34_Backbone_cv_image_seg_github |
1 |
143.0 |
| pytorch_VGG19-UNet_base_cv_image_seg_custom |
1 |
151.0 |
| pytorch_ViT_Base_cv_image_cls_huggingface |
8 |
237.0 |
| pytorch_VoVNet_Ese_Vovnet19b_Dw.ra_In1k_cv_image_cls_timm |
8 |
713.0 |
Model coverage
Info: Full list of supported models is available in the assets section.
| Model task |
Model architecture |
Model variant |
Model framework |
Inference |
Training |
n150 |
n300 |
p150 |
Single device |
Data parallel |
Tensor parallel |
Model source |
| conditional generation |
U-Net for Conditional Generation |
Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
AlexNet |
Custom 1x2 |
jax |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| cv image cls |
DINOv2 |
Small |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
EfficientNet |
B0 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Batchnorm |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Dropout |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Dropout |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Cnn Nodropout |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom |
jax |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MNIST |
Mlp Custom 1x2 |
jax |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| cv image cls |
MobileNetV1 |
Mobilenet v1 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
MobileNetV2 |
Mobilenet v2 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
ResNet |
ResNet50 HuggingFace High Resolution |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
SegFormer |
Mit B0 |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
Swin |
S |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
VGG |
HF Vgg19 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image cls |
ViT |
Base |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image cls |
VoVNet |
Ese Vovnet19b Dw.ra In1k |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv image seg |
MaskFormer Swin-B |
Swin Base Coco |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image seg |
Ultra-Fast Lane Detection |
TuSimple ResNet18 Backbone |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv image seg |
VGG19-UNet |
base |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv img to img |
Autoencoder |
linear |
pytorch |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
Attention DenseUNet |
Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
DETR |
ResNet50 Backbone |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
OWL-ViT |
Base Patch32 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
PointPillars |
pointpillars |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOP |
Default |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOS Small |
Small |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOv4 |
Base |
pytorch |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
View Source |
| cv object det |
YOLOv7 |
Default |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
YOLOv9 |
T |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv object det |
ssd512 |
ssd512 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| cv panoptic seg |
Panoptic Segmentation |
ResNet50 Backbone 1x COCO |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm action prediction |
OpenVLA-OFT |
Finetuned Libero 10 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm action prediction |
pi_0 |
pi0 base |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm image text similarity |
CLIP |
Base Patch16 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm image text similarity |
SigLIP |
Base Patch16 224 |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| mm visual qa |
Mistral |
base |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
ALLaM |
7B Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Command_A_Reasoning |
command-a-reasoning-08-2025 |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Falcon |
3 10B Base |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Falcon |
3 1B Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Falcon |
3 3B Base |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Falcon |
3 7B Base |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
GPT-2 |
Base |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
GPT-2 |
Xl |
jax |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
GPT-OSS |
20B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
1.1 2B IT |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Gemma |
1.1 7B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
2 27B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Gemma |
2 2B IT |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Gemma |
2 9B IT |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.1 70B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.1 70B Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.1 8B Instruct |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Llama |
3.2 1B |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Llama |
3.2 3B |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Llama |
3.3 70B Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
7B INSTRUCT v03 |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Devstral Small 2505 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Large INSTRUCT 2411 |
pytorch |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Magistral Small 2506 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Ministral 8B Instruct |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Nemo INSTRUCT 2407 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Mistral |
Small 24B INSTRUCT 2501 |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Phi-1 |
Phi 1 |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1 |
Phi 1 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1.5 |
Phi 1 5 |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-1.5 |
Phi 1 5 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-2 |
Phi 2 |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini 128K Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini 4K Instruct |
pytorch |
✅ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-3 |
Mini Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Phi-4 |
Phi 4 |
pytorch |
✅ |
❌ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
View Source |
| nlp causal lm |
Qwen 2 |
Qwq 32B |
pytorch |
✅ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B Instruct |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Qwen 2.5 |
0.5B Instruct |
pytorch |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |
| nlp causal lm |
Qwen 2.5 |
1.5B Instruct |
jax |
✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
❌ |
View Source |