An open-source framework that combines large language models with evolutionary algorithms to discover and optimize high-performing code solutions.
CodeEvolve democratizes algorithmic discovery by making LLM-driven evolutionary search transparent, reproducible, and accessible. Whether you're tackling combinatorial optimization, discovering novel algorithms, or optimizing computational kernels, CodeEvolve provides a modular foundation for automated code synthesis guided by quantifiable metrics.
- Why CodeEvolve?
- Key Features
- How It Works
- Architecture
- Performance Highlights
- Quick Start
- Use Cases
- Reproducing Research Results
- Documentation
- Contributing
- Citation
State-of-the-art performance with transparency. CodeEvolve matches or exceeds the performance of closed-source systems like Google DeepMind's AlphaEvolve on established algorithm-discovery benchmarks, while remaining fully open and reproducible.
Cost-effective solutions. Open-weight models like Qwen often match or outperform expensive closed-source LLMs at a fraction of the compute cost, making cutting-edge algorithmic discovery accessible to researchers and practitioners with limited budgets.
Designed for real problems. CodeEvolve addresses meta-optimization tasks where you need to discover programs that solve complex optimization problems—from mathematical constructions to scientific discovery.
Multiple populations evolve independently and periodically exchange top performers, maintaining diversity while propagating successful solutions across the search space. This parallel architecture enables efficient exploration and scales naturally to concurrent evaluation.
Inspiration-based Crossover: Contextual recombination that combines successful solution patterns while preserving semantic coherence. Parent solutions are presented to the LLM along with high-performing "inspiration" programs, allowing it to synthesize novel combinations.
Meta-prompting Exploration: Evolves the prompts themselves, enabling the LLM to reflect on and rewrite its own instructions for more diverse search trajectories. The system maintains a population of prompts that co-evolve with solutions.
Depth-based Exploitation: Targeted refinement mechanism that makes precise edits to promising solutions by maintaining conversation history. The LLM sees the full evolutionary lineage, enabling incremental improvements while preserving working components.
MAP-Elites Integration: Optional quality-diversity archive that maintains behavioral diversity across the solution space. Supports both grid-based and CVT-based (Centroidal Voronoi Tessellation) feature maps.
Ensemble Support: Mix and match multiple LLMs with weighted selection. Use different models for exploration vs. exploitation phases, or combine open-weight and proprietary models.
OpenAI-Compatible APIs: Works with any OpenAI-compatible endpoint including vLLM, Ollama, Together AI, and cloud providers.
CodeEvolve operates as a distributed evolutionary algorithm where code itself is the evolving entity:
-
Initialization: Start with an initial code template and system prompt that defines the task
-
Selection: Choose parent programs based on fitness
- Exploration mode: Random or uniform selection for broad search
- Exploitation mode: Tournament or roulette selection for refinement
-
Variation: Generate new candidates through LLM-driven operations
- Exploration: Broad modifications with meta-prompting
- Exploitation: Targeted improvements with conversation history and inspiration programs
-
Code Generation: LLM produces SEARCH/REPLACE diffs
- Only specified code blocks are modified (between markers)
- Preserves working code outside evolution zones
- Applies structured diffs rather than regenerating entire files
-
Evaluation: Execute in sandboxed environment
- Resource limits (time, memory)
- Capture metrics and errors
- Extract fitness from evaluation results
-
Migration: Periodically exchange top solutions between islands
- Maintains diversity while spreading innovations
- Prevents premature convergence
-
Archiving (optional): MAP-Elites maintains diverse solutions
- Preserves behavioral variety
- Enables multi-objective optimization
The system dynamically balances exploration and exploitation:
| Phase | Selection | LLM Context | Operators |
|---|---|---|---|
| Exploration | Random/uniform | No history | Meta-prompting enabled, no inspirations |
| Exploitation | Tournament/fitness | Full lineage | Inspiration programs, deep conversation |
The exploration rate is controlled by a scheduler (e.g., exponential decay) and can adapt based on fitness improvements.
┌─────────┐ ┌─────────┐ ┌─────────┐
│Island 0 │────▶│Island 1 │────▶│Island 2 │
│Pop: 20 │ │Pop: 20 │ │Pop: 20 │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────────┴───────────────┘
Periodic Migration (Ring Topology)
Each island maintains:
- Solution population
- Prompt population (if meta-prompting enabled)
- Local fitness rankings
- Migration history
CodeEvolve operates through an iterative process at each epoch:
-
Population Management: Each island maintains populations of prompts and solutions, evaluated against user-defined fitness metrics
-
Evolutionary Operators: Generate new candidates through crossover, mutation, and meta-prompting
-
LLM Ensemble: Transforms operator instructions into executable code modifications via structured SEARCH/REPLACE diffs
-
Selection & Migration: Top performers are retained and periodically migrated between islands
-
Archive: MAP-Elites-based archive preserves behavioral diversity across the search
Execution feedback and fitness signals guide the entire loop, translating LLM proposals into testable, executable artifacts.
- Manages populations with genealogical tracking
- Implements selection strategies: random, tournament, roulette, best
- Optional MAP-Elites integration (Grid or CVT-based)
- Automatic population size management
- Main evolutionary loop coordinating all components
- Handles parent selection, variation, and evaluation
- Manages exploration/exploitation scheduling
- Checkpoint creation and restoration
- Distributed execution with multiprocessing
- Synchronous or asynchronous migration
- Shared global best tracking
- Coordinated early stopping
- OpenAI-compatible API wrapper
- Ensemble with weighted random selection
- Automatic retry with exponential backoff
- Embedding generation support
- Builds conversation histories from program lineages
- Incorporates inspiration programs for crossover
- Meta-prompting for prompt evolution
- Dynamic depth control
- Sandboxed program execution
- Memory and timeout monitoring
- Process tree management
- Metrics extraction from JSON results
CodeEvolve demonstrates superior performance on several benchmarks previously used to assess AlphaEvolve:
- Competitive or better results across diverse algorithm-discovery tasks including autocorrelation inequalities, packing problems, and Heilbronn problems
- Open-weight models (e.g., Qwen) matching closed-source performance at significantly lower cost
- Extensive ablations quantifying each component's contribution to search efficiency
For comprehensive evaluation details and specific results, see our technical report.
Clone this repository and create the conda environment:
git clone https://github.com/inter-co/science-codeevolve.git
cd science-codeevolve
conda env create -f environment.yml
conda activate codeevolveConfigure your LLM provider by setting environment variables:
export API_KEY=your_api_key_here
export API_BASE=your_api_base_urlRun CodeEvolve via the command line:
codeevolve \
--inpt_dir=INPT_DIR \
--cfg_path=CFG_PATH \
--out_dir=RESULTS_DIR \
--load_ckpt=LOAD_CKPT \
--terminal_loggingArguments:
--inpt_dir: Directory containing initial solution and evaluation script--cfg_path: Path to YAML configuration file (required for new runs)--out_dir: Directory where results will be saved--load_ckpt: Checkpoint to load (0 for new run, -1 for latest, or specific epoch)--terminal_logging: Enable live progress display (optional)
The scripts/run.sh provides a bash script for running CodeEvolve with taskset to limit CPU usage. See src/codeevolve/cli.py for further details.
Here's a complete minimal example for optimizing a simple function:
1. Create problem directory:
mkdir -p my_problem/input2. Create initial solution (my_problem/input/solution.py):
# EVOLVE-BLOCK-START
def objective(x):
"""Function to maximize."""
return -(x - 3)**2 + 10
# EVOLVE-BLOCK-END
if __name__ == '__main__':
x = 0.0 # Initial guess
result = objective(x)
print(f"Result: {result}")3. Create evaluator (my_problem/input/evaluate.py):
import sys
import json
import subprocess
def evaluate(code_path, results_path):
# Run the solution
result = subprocess.run(
[sys.executable, code_path],
capture_output=True,
text=True,
timeout=10
)
# Extract fitness from output
fitness = float(result.stdout.split(':')[1].strip())
# Save results
with open(results_path, 'w') as f:
json.dump({'fitness': fitness}, f)
if __name__ == '__main__':
evaluate(sys.argv[1], sys.argv[2])4. Create config (my_problem/config.yaml):
SEED: 42
CODEBASE_PATH: "."
EVAL_FILE_NAME: "evaluate.py"
INIT_FILE_DATA:
filename: "solution.py"
language: "python"
SYS_MSG: |
# PROMPT-BLOCK-START
You are an expert optimization algorithm designer. Your goal is to modify
the given code to maximize the objective function.
# PROMPT-BLOCK-END
ENSEMBLE:
- model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
temp: 0.8
weight: 1
SAMPLER_AUX_LM:
model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
temp: 0.7
EVOLVE_CONFIG:
num_epochs: 50
num_islands: 2
migration_topology: "directed_ring"
migration_interval: 10
selection_policy: "tournament"
selection_kwargs:
tournament_size: 3
num_inspirations: 2
exploration_rate: 0.3
fitness_key: "fitness"
ckpt: 10
early_stopping_rounds: 205. Run:
codeevolve \
--inpt_dir=my_problem/input \
--cfg_path=my_problem/config.yaml \
--out_dir=my_problem/results \
--terminal_loggingCodeEvolve is designed for algorithmic problems with quantifiable metrics. To apply it to your domain:
-
Define your evaluation function that measures solution quality
- Must accept: code file path, results file path
- Must output: JSON with at least one metric field
-
Specify the initial codebase or problem structure
- Mark evolution zones with
# EVOLVE-BLOCK-STARTand# EVOLVE-BLOCK-END - Code outside these blocks is preserved
- Mark evolution zones with
-
Configure evolutionary parameters
- Population size, mutation rates, selection policy
- Exploration/exploitation balance
- Migration topology and frequency
-
Choose your LLM ensemble composition
- Single model or multiple models
- Separate ensembles for exploration vs exploitation
- Weight distribution across models
See problems/problem_template for a general template. Comprehensive tutorials and example notebooks will be released soon.
The framework is suitable for any domain where solutions can be represented as code and evaluated programmatically:
- Finding solutions to open problems in mathematics
- Discovering new inequalities or bounds
- Constructing optimal geometric configurations
- Optimizing computational kernels and scheduling algorithms
- Discovering novel heuristics for NP-hard problems
- Automated algorithm configuration
- Exploring hypothesis spaces expressed as executable code
- Parameter optimization for scientific models
- Automated experimental design
- Performance tuning of critical code paths
- Automatic parallelization strategies
- Resource allocation optimization
For complete experimental configurations, benchmark implementations, and step-by-step examples demonstrating how to run CodeEvolve on various problems, visit our experiments repository:
github.com/inter-co/science-codeevolve-experiments
This companion repository contains all code necessary to reproduce the results from our technical report, including:
- All benchmark problem implementations
- Experimental configurations for each problem
- Raw results and checkpoints from paper runs
- Analysis notebooks with visualizations
- Statistical comparisons with AlphaEvolve
Key configuration parameters in your YAML file:
EVOLVE_CONFIG:
# Basic settings
num_epochs: 100 # Total iterations
num_islands: 4 # Parallel populations
init_pop: 20 # Initial population per island
max_size: 50 # Max population size (null = unlimited)
# Selection
selection_policy: "tournament" # or "roulette", "random", "best"
selection_kwargs:
tournament_size: 3
# Exploration/Exploitation
exploration_rate: 0.3 # Probability of exploration
use_scheduler: true # Use adaptive scheduling
type: "ExponentialDecayScheduler"
scheduler_kwargs:
decay_rate: 0.995
# Operators
num_inspirations: 3 # Inspiration programs for crossover
meta_prompting: true # Enable prompt evolution
max_chat_depth: 5 # Conversation history depth
# Migration
migration_topology: "directed_ring" # Island topology
migration_interval: 20 # Epochs between migrations
migration_rate: 0.1 # Fraction to migrate
# Quality-Diversity (optional)
use_map_elites: false # Enable MAP-Elites
# Checkpointing
ckpt: 10 # Checkpoint frequency
early_stopping_rounds: 50 # Stop after N epochs without improvement
# Fitness
fitness_key: "fitness" # Metric name from evaluation JSONcodeevolve [OPTIONS]
Required:
--inpt_dir PATH Input directory with solution and evaluator
--out_dir PATH Output directory for results
Optional:
--cfg_path PATH Config file (required for new runs)
--load_ckpt INT Checkpoint: 0=new, -1=latest, N=epoch N
--terminal_logging Show live progress in terminaloutput_directory/
├── config.yaml # Copy of configuration used
├── 0/ # Island 0 results
│ ├── results.log # Detailed execution log
│ ├── best_sol.py # Best solution found
│ ├── best_prompt.txt # Best prompt evolved
│ └── ckpt/ # Checkpoints
│ ├── ckpt_10.pkl
│ ├── ckpt_20.pkl
│ └── ...
├── 1/ # Island 1 results
│ └── ...
└── ...
We organize the different versions of CodeEvolve as releases in both this repository and its companion experiments repository. Currently, we have the following releases:
- v0.1.0: Initial version of CodeEvolve, corresponds to v1 of technical report
- v0.2.0 / v0.2.1: Most recent release, corresponds to v3 of technical report with minor bug fixes
We welcome contributions from the community! Here's how to get involved:
-
Start with an issue: Browse existing issues or create a new one describing your proposed change
-
Submit a pull request:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Reference the issue in your PR description
-
Keep PRs focused: Avoid massive changes—smaller, well-tested contributions are easier to review
-
Maintain quality: Ensure code is tested, documented, and follows existing style
Please refer to CONTRIBUTING.md for detailed guidelines.
- New selection policies or evolutionary operators
- Additional LLM providers and integrations
- Benchmark problems from your domain
- Documentation improvements and tutorials
- Performance optimizations
- Bug fixes and test coverage
If you use CodeEvolve in your research, please cite our paper:
@article{assumpção2025codeevolveopensourceevolutionary,
title={CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization},
author={Henrique Assumpção and Diego Ferreira and Leandro Campos and Fabricio Murai},
year={2025},
eprint={2510.14150},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.14150},
}The authors thank Bruno Grossi for his continuous support during the development of this project. We thank Fernando Augusto and Tiago Machado for useful conversations about possible applications of CodeEvolve. We also thank the OpenEvolve community for their inspiration and discussion about evolutionary coding agents.
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0.
This is not an official Inter product.
