Scale your LLM-as-a-judge.
-
Updated
Jun 26, 2025 - Jupyter Notebook
Scale your LLM-as-a-judge.
Official repository of the paper, PokeChamp: an Expert-level Minimax Language Agent for Competitive Pokemon.
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Test-Time Memory Framework: Control Hallucinations in Foundation Models
Code for ICML 2025 How Do Large Language Monkeys Get Their Power (Laws)?
An experimental project using MCTS to refine LLM responses for better accuracy and decision-making.
A Framework Enabling Web Agents to Master Workflows From Human Demonstration
Add a description, image, and links to the test-time-compute topic page so that developers can more easily learn about it.
To associate your repository with the test-time-compute topic, visit your repo's landing page and select "manage topics."