Native MTP Speculative Decoding On Apple Silicon | 2x - 2.5x decode TPS increase at temp 0.6 | MLX-native, OpenAI API/Anthropic-compatible serving, no external drafter.
metal mtp mlx inference-engine apple-silicon local-ai qwen speculative-decoding speculative-sampling openai-compatible qwen3-next anthropic-compatible native-mtp mtplx
-
Updated
May 5, 2026 - Python