Skip to content

v0.1.3 - MTP Graph Safety

Choose a tag to compare

@weicj weicj released this 05 Jun 14:47
41f231a

v0.1.3 - MTP Graph Safety

  • Adds graph-safety handling for Native MTP + hybrid Mamba/GDN models.
  • Production profiles now fall back from full decode CUDA Graph replay to PIECEWISE/NONE for this risky combination.
  • Keeps the old peak-throughput path available for explicit benchmark profiles via VLLM_ALLOW_MAMBA_SPEC_FULL_CUDAGRAPH=1.
  • Updates VERSION, CHANGELOG.md, and README release markers to v0.1.3.