Production-ready test-time compute optimization framework for LLM inference. Implements Best-of-N, Sequential Revision, and Beam Search strategies. Validated with models up to 7B parameters.
machine-learning deep-learning optimization transformers inference pytorch llm llm-orchestration inference-scaling self-hosted-ai test-time-compute context-extension recursive-language-models compute-optimal-inference verifier-models
-
Updated
Jan 27, 2026 - Python