multiple tokens, and a verifier filters them using the main model’s confidence. Focuses on speed–accuracy tradeoffs, visualization, and modular design for easy benchmarking and research.
-
Updated
Nov 9, 2025 - Jupyter Notebook
multiple tokens, and a verifier filters them using the main model’s confidence. Focuses on speed–accuracy tradeoffs, visualization, and modular design for easy benchmarking and research.
CLI for building and testing DFlash-style speculative decoding draft models.
Cross-vocabulary speculative decoding: a CPU-verifiable reference implementation and acceptance-length (tau) measurement harness.
Add a description, image, and links to the draft-model topic page so that developers can more easily learn about it.
To associate your repository with the draft-model topic, visit your repo's landing page and select "manage topics."