Skip to content

webfuse-com/awesome-autoresearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ Awesome Autoresearch

A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by karpathy/autoresearch.

Awesome PRs Welcome License: CC0-1.0

Contents

πŸ› οΈ General-purpose descendants

  • kayba-ai/recursive-improve - Recursive self-improvement framework where agents capture execution traces, analyze failure patterns, and apply targeted fixes with keep-or-revert evaluation. GitHub stars
  • vukrosic/auto-research - Docs-only control plane for an open autonomous AI research lab β€” file-based operating model for human direction and agent execution. GitHub stars
  • uditgoenka/autoresearch - Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals. GitHub stars
  • leo-lilinxiao/codex-autoresearch - Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows. GitHub stars
  • SeeleAI/Thoth - Dashboard-first Claude Code and Codex runtime for autoresearch, with durable runs, locked work items, visible ledgers, and reviewable verdicts. GitHub stars
  • supratikpm/gemini-autoresearch - Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/. GitHub stars
  • davebcn87/pi-autoresearch - pi extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions. GitHub stars
  • drivelineresearch/autoresearch-claude-code - Claude Code plugin/skill port of pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study. GitHub stars
  • greyhaven-ai/autocontext - Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes. GitHub stars
  • Necmttn/ax - Local retro loop for AI coding agents: captures session traces, turns repeated friction into proposals, and tracks accepted fixes as experiments. GitHub stars
  • jmilinovich/goal-md - Generalizes autoresearch into a GOAL.md pattern for repos where the agent must first construct a measurable fitness function before it can optimize. GitHub stars
  • james-s-tayler/lazy-developer - Claude Code skill that orchestrates autoresearch across a prioritized sequence of optimization goals (coverage, test speed, build speed, complexity, LOC, performance) using GOAL.md as the engine. Supports standalone and Ralph Mode multi-instance execution. GitHub stars
  • mutable-state-inc/autoresearch-at-home - Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents. GitHub stars
  • zkarimi22/autoresearch-anything - Generalizes autoresearch to any measurable metric β€” system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it." GitHub stars
  • Entrpi/autoresearch-everywhere - Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch. GitHub stars
  • ShengranHu/ADAS - Automated Design of Agentic Systems β€” ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code. GitHub stars
  • MaximeRobeyns/self_improving_coding_agent - SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks. GitHub stars
  • peterskoett/self-improving-agent - Alternative self-improving agent architecture with reflection and meta-learning cycles. GitHub stars
  • metauto-ai/HGM - Huxley-GΓΆdel Machine for coding agents β€” applies self-improvement to SWE-bench performance via meta-level optimization. GitHub stars
  • gepa-ai/gepa - GEPA (Genetic-Pareto) β€” ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection. GitHub stars
  • sentient-agi/EvoSkill - Automated skill discovery for coding agents: evolves reusable skills and prompts from failed trajectories against benchmarks, with support for Claude Code, Codex CLI, OpenCode, OpenHands, and Goose. GitHub stars
  • MrTsepa/autoevolve - GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill. GitHub stars
  • HKUDS/ClawTeam - Agent swarm intelligence for autoresearch β€” spawns parallel GPU research directions, distributes work across agents, aggregates results. GitHub stars
  • Orchestra-Research/AI-Research-SKILLs - Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis). GitHub stars
  • WecoAI/aideml - AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation. GitHub stars
  • weco.ai - Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs β€” brings the autoresearch loop into production.

πŸ”¬ Research-agent systems

  • aiming-lab/AutoResearchClaw - End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage. GitHub stars
  • OpenRaiser/NanoResearch - End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs. GitHub stars
  • kaust-ark/ARK - ARK (Automatic Research Kit): idea + venue β†’ paper pipeline orchestrating 6 agents β€” proposal analysis, literature search, Slurm experiments, LaTeX drafting, iterative peer review. Controlled via CLI, web dashboard, or Telegram. GitHub stars
  • wanshuiyin/Auto-claude-code-research-in-sleep - Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique. GitHub stars
  • skyllwt/AutoSci - Wiki-centric full-lifecycle research platform built on Claude Code, realizing Karpathy's LLM-Wiki vision. 20+ skills cover the full loop: ingest β†’ ideate β†’ novelty check β†’ experiment design / run / eval β†’ paper writing. Research state lives in a structured knowledge wiki with an interactive graph. GitHub stars
  • Sibyl-Research-Team/AutoResearch-SibylSystem - Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop. GitHub stars
  • eimenhmdt/autoresearcher - Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research. GitHub stars
  • hyperspaceai/agi - Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains. GitHub stars
  • Human-Agent-Society/CORAL - CORAL: Autonomous multi-agent evolution for open-ended discovery (arXiv:2604.01658). Long-running agents with shared persistent memory, asynchronous execution, and heartbeat-based interventions; SOTA on 10 math/algorithmic/systems tasks. GitHub stars
  • SakanaAI/AI-Scientist - The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision. GitHub stars
  • SakanaAI/AI-Scientist-v2 - Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains. GitHub stars
  • AweAI-Team/AiScientist - AiScientist: long-horizon ML research lab with hierarchical orchestration and File-as-Bus coordination β€” workspace files act as the durable system of record. Drives autonomous paper-reproduction (PaperBench) and competition-style MLE-Bench iteration loops under fixed compute/time budgets. (arXiv 2604.13018) GitHub stars
  • HKUDS/AI-Researcher - NeurIPS 2025 paper. Full end-to-end research automation: hypothesis β†’ experiments β†’ manuscript β†’ peer review. Production version at novix.science. GitHub stars
  • openags/Auto-Research - OpenAGS: Orchestrates a team of AI agents across the full research lifecycle β€” lit review, hypothesis generation, experiments, manuscript writing, and peer review. GitHub stars
  • SamuelSchmidgall/AgentLaboratory - End-to-end autonomous research workflow: idea β†’ literature review β†’ experiments β†’ report. Supports both autonomous and co-pilot modes. GitHub stars
  • AgentRxiv - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
  • JinheonBaek/ResearchAgent - Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops. GitHub stars
  • du-nlp-lab/MLR-Copilot - Autonomous ML research framework β€” generates ideas, implements experiments, analyzes results. GitHub stars
  • MASWorks/ML-Agent - Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance. GitHub stars
  • PouriaRouzrokh/LatteReview - Low-code Python package for automated systematic literature reviews via AI-powered agents. GitHub stars
  • LitLLM/LitLLM - AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing. GitHub stars
  • Agent Laboratory - Three-phase research pipeline: Literature Review β†’ Experimentation β†’ Report Writing, with specialized agents for each phase.

πŸ’» Platform ports and hardware forks

  • gianfrancopiana/openclaw-autoresearch - OpenClaw port of pi-autoresearch; autonomous experiment loop for any optimization target with statistical confidence scoring. GitHub stars
  • miolini/autoresearch-macos - Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape. GitHub stars
  • trevin-creator/autoresearch-mlx - MLX-native Apple Silicon port that keeps the upstream fixed-budget val_bpb loop while removing the PyTorch/CUDA dependency entirely. GitHub stars
  • jsegov/autoresearch-win-rtx - Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path. GitHub stars
  • iii-hq/n-autoresearch - Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic train.py loop. GitHub stars
  • lucasgelfond/autoresearch-webgpu - Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup. GitHub stars
  • tonitangpotato/autoresearch-engram - Fork with persistent cognitive memory β€” frequency-weighted retrieval of cross-session knowledge for improved experiment continuity. GitHub stars
  • Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 β†’ PyTorch SDPA, removes H100-only kernel dependency.
  • ArmanJR-Lab/autoautoresearch - Jetson AGX Orin port with a director β€” a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis. GitHub stars

🎯 Domain-specific adaptations

  • mattprusak/autoresearch-genealogy - Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research. GitHub stars
  • ArchishmanSengupta/autovoiceevals - Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs. GitHub stars
  • chrisworsey55/atlas-gic - Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss. GitHub stars
  • RightNow-AI/autokernel - Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat. GitHub stars
  • ElliotXie/autozyme - Multi-agent framework that applies the autoresearch keep-or-revert loop to CPU-side scientific software: profile a target function, generate one optimization candidate, benchmark for speed while preserving the original outputs, keep or revert, repeat. GitHub stars
  • Agent-Analytics/autoresearch-growth - Applies autoresearch to landing-page positioning and A/B test candidates, using analytics snapshots and measured experiment results to seed subsequent rounds. GitHub stars
  • Rkcr7/autoresearch-sudoku - Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets. GitHub stars
  • jeongph/autospec - Reads natural-language business rules and autonomously builds a Spring Boot service with tests via the keep-or-revert loop. Evaluates with Gradle build + JUnit XML. 119-line skeleton to 950 lines in 5 cycles. GitHub stars
  • vlasenkoalexey/tpu_performance_autoresearch_wiki - Applies the autoresearch keep-or-revert loop to TPU model performance (MFU / tokens-per-sec) on v6e hardware: profiles each run through an XProf MCP server, makes one model-code change per experiment, and keeps or reverts against measured MFU. Pairs the loop with a Karpathy-style LLM wiki for domain knowledge and per-experiment optimization traces; includes Llama3-8B and Qwen3-8B case studies across JAX and torchax lanes. GitHub stars

πŸ“Š Evaluation & benchmarks

  • snap-stanford/MLAgentBench - Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM. GitHub stars
  • OpenAI/mle-bench - OpenAI's benchmark for measuring how well AI agents perform at ML engineering. GitHub stars
  • chchenhui/mlrbench - MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops. GitHub stars
  • gersteinlab/ML-Bench - Evaluates LLMs and agents for ML tasks on repository-level code. GitHub stars
  • THUDM/AgentBench - Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024. GitHub stars

πŸ“ˆ Notable use cases and writeups

πŸ“š Related resources

Curated lists and paper collections for AI agents, autonomous systems, and automated research:

πŸ“„ License

This list is released under CC0-1.0.

Releases

No releases published

Packages

 
 
 

Contributors