Scale your LLM-as-a-judge.
-
Updated
Jun 26, 2025 - Jupyter Notebook
Scale your LLM-as-a-judge.
Designing Multi-Agent Systems with Zero Supervision
Inference-Time Alignment in Protein Diffusion Models
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.
This repo is the code for T-SCEND, a novel framework that significantly improves diffusion model’s reasoning capabilities with better energy-based training and scaling up test-time computation.
The official PyTorch implementation for the Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
A sample Bedrock wrapper for inference-time compute tradeoffs
Code for the paper "Blockwise Control for Denoising Diffusion Models"
Add a description, image, and links to the inference-time-compute topic page so that developers can more easily learn about it.
To associate your repository with the inference-time-compute topic, visit your repo's landing page and select "manage topics."