Skip to content

tavis1536-art/science-codeevolve

 
 

Repository files navigation

CodeEvolve

License arxiv version

An open-source framework that combines large language models with evolutionary algorithms to discover and optimize high-performing code solutions.

CodeEvolve democratizes algorithmic discovery by making LLM-driven evolutionary search transparent, reproducible, and accessible. Whether you're tackling combinatorial optimization, discovering novel algorithms, or optimizing computational kernels, CodeEvolve provides a modular foundation for automated code synthesis guided by quantifiable metrics.

Table of Contents

Why CodeEvolve?

State-of-the-art performance with transparency. CodeEvolve matches or exceeds the performance of closed-source systems like Google DeepMind's AlphaEvolve on established algorithm-discovery benchmarks, while remaining fully open and reproducible.

Cost-effective solutions. Open-weight models like Qwen often match or outperform expensive closed-source LLMs at a fraction of the compute cost, making cutting-edge algorithmic discovery accessible to researchers and practitioners with limited budgets.

Designed for real problems. CodeEvolve addresses meta-optimization tasks where you need to discover programs that solve complex optimization problems—from mathematical constructions to scientific discovery.

Key Features

Islands-based Genetic Algorithm

Multiple populations evolve independently and periodically exchange top performers, maintaining diversity while propagating successful solutions across the search space. This parallel architecture enables efficient exploration and scales naturally to concurrent evaluation.

Modular Evolutionary Operators

Inspiration-based Crossover: Contextual recombination that combines successful solution patterns while preserving semantic coherence. Parent solutions are presented to the LLM along with high-performing "inspiration" programs, allowing it to synthesize novel combinations.

Meta-prompting Exploration: Evolves the prompts themselves, enabling the LLM to reflect on and rewrite its own instructions for more diverse search trajectories. The system maintains a population of prompts that co-evolve with solutions.

Depth-based Exploitation: Targeted refinement mechanism that makes precise edits to promising solutions by maintaining conversation history. The LLM sees the full evolutionary lineage, enabling incremental improvements while preserving working components.

Quality-Diversity Optimization

MAP-Elites Integration: Optional quality-diversity archive that maintains behavioral diversity across the solution space. Supports both grid-based and CVT-based (Centroidal Voronoi Tessellation) feature maps.

Flexible LLM Integration

Ensemble Support: Mix and match multiple LLMs with weighted selection. Use different models for exploration vs. exploitation phases, or combine open-weight and proprietary models.

OpenAI-Compatible APIs: Works with any OpenAI-compatible endpoint including vLLM, Ollama, Together AI, and cloud providers.

How It Works

CodeEvolve operates as a distributed evolutionary algorithm where code itself is the evolving entity:

The Evolutionary Loop

  1. Initialization: Start with an initial code template and system prompt that defines the task

  2. Selection: Choose parent programs based on fitness

    • Exploration mode: Random or uniform selection for broad search
    • Exploitation mode: Tournament or roulette selection for refinement
  3. Variation: Generate new candidates through LLM-driven operations

    • Exploration: Broad modifications with meta-prompting
    • Exploitation: Targeted improvements with conversation history and inspiration programs
  4. Code Generation: LLM produces SEARCH/REPLACE diffs

    • Only specified code blocks are modified (between markers)
    • Preserves working code outside evolution zones
    • Applies structured diffs rather than regenerating entire files
  5. Evaluation: Execute in sandboxed environment

    • Resource limits (time, memory)
    • Capture metrics and errors
    • Extract fitness from evaluation results
  6. Migration: Periodically exchange top solutions between islands

    • Maintains diversity while spreading innovations
    • Prevents premature convergence
  7. Archiving (optional): MAP-Elites maintains diverse solutions

    • Preserves behavioral variety
    • Enables multi-objective optimization

Exploration vs Exploitation

The system dynamically balances exploration and exploitation:

Phase Selection LLM Context Operators
Exploration Random/uniform No history Meta-prompting enabled, no inspirations
Exploitation Tournament/fitness Full lineage Inspiration programs, deep conversation

The exploration rate is controlled by a scheduler (e.g., exponential decay) and can adapt based on fitness improvements.

Distributed Islands Architecture

┌─────────┐     ┌─────────┐     ┌─────────┐
│Island 0 │────▶│Island 1 │────▶│Island 2 │
│Pop: 20  │     │Pop: 20  │     │Pop: 20  │
└────┬────┘     └────┬────┘     └────┬────┘
     │               │               │
     └───────────────┴───────────────┘
        Periodic Migration (Ring Topology)
        
Each island maintains:
- Solution population
- Prompt population (if meta-prompting enabled)
- Local fitness rankings
- Migration history

Architecture

CodeEvolve operates through an iterative process at each epoch:

  1. Population Management: Each island maintains populations of prompts and solutions, evaluated against user-defined fitness metrics

  2. Evolutionary Operators: Generate new candidates through crossover, mutation, and meta-prompting

  3. LLM Ensemble: Transforms operator instructions into executable code modifications via structured SEARCH/REPLACE diffs

  4. Selection & Migration: Top performers are retained and periodically migrated between islands

  5. Archive: MAP-Elites-based archive preserves behavioral diversity across the search

Execution feedback and fitness signals guide the entire loop, translating LLM proposals into testable, executable artifacts.

Core Components

Program Database (database.py)

  • Manages populations with genealogical tracking
  • Implements selection strategies: random, tournament, roulette, best
  • Optional MAP-Elites integration (Grid or CVT-based)
  • Automatic population size management

Evolution Engine (evolution.py)

  • Main evolutionary loop coordinating all components
  • Handles parent selection, variation, and evaluation
  • Manages exploration/exploitation scheduling
  • Checkpoint creation and restoration

Islands Coordinator (islands.py)

  • Distributed execution with multiprocessing
  • Synchronous or asynchronous migration
  • Shared global best tracking
  • Coordinated early stopping

LLM Interface (lm.py)

  • OpenAI-compatible API wrapper
  • Ensemble with weighted random selection
  • Automatic retry with exponential backoff
  • Embedding generation support

Prompt Sampler (prompt/sampler.py)

  • Builds conversation histories from program lineages
  • Incorporates inspiration programs for crossover
  • Meta-prompting for prompt evolution
  • Dynamic depth control

Evaluator (evaluator.py)

  • Sandboxed program execution
  • Memory and timeout monitoring
  • Process tree management
  • Metrics extraction from JSON results

Performance Highlights

CodeEvolve demonstrates superior performance on several benchmarks previously used to assess AlphaEvolve:

  • Competitive or better results across diverse algorithm-discovery tasks including autocorrelation inequalities, packing problems, and Heilbronn problems
  • Open-weight models (e.g., Qwen) matching closed-source performance at significantly lower cost
  • Extensive ablations quantifying each component's contribution to search efficiency

For comprehensive evaluation details and specific results, see our technical report.

Quick Start

Installation

Clone this repository and create the conda environment:

git clone https://github.com/inter-co/science-codeevolve.git
cd science-codeevolve
conda env create -f environment.yml
conda activate codeevolve

Basic Usage

Configure your LLM provider by setting environment variables:

export API_KEY=your_api_key_here
export API_BASE=your_api_base_url

Run CodeEvolve via the command line:

codeevolve \
  --inpt_dir=INPT_DIR \
  --cfg_path=CFG_PATH \
  --out_dir=RESULTS_DIR \
  --load_ckpt=LOAD_CKPT \
  --terminal_logging

Arguments:

  • --inpt_dir: Directory containing initial solution and evaluation script
  • --cfg_path: Path to YAML configuration file (required for new runs)
  • --out_dir: Directory where results will be saved
  • --load_ckpt: Checkpoint to load (0 for new run, -1 for latest, or specific epoch)
  • --terminal_logging: Enable live progress display (optional)

The scripts/run.sh provides a bash script for running CodeEvolve with taskset to limit CPU usage. See src/codeevolve/cli.py for further details.

Minimal Example

Here's a complete minimal example for optimizing a simple function:

1. Create problem directory:

mkdir -p my_problem/input

2. Create initial solution (my_problem/input/solution.py):

# EVOLVE-BLOCK-START
def objective(x):
    """Function to maximize."""
    return -(x - 3)**2 + 10
# EVOLVE-BLOCK-END

if __name__ == '__main__':
    x = 0.0  # Initial guess
    result = objective(x)
    print(f"Result: {result}")

3. Create evaluator (my_problem/input/evaluate.py):

import sys
import json
import subprocess

def evaluate(code_path, results_path):
    # Run the solution
    result = subprocess.run(
        [sys.executable, code_path],
        capture_output=True,
        text=True,
        timeout=10
    )
    
    # Extract fitness from output
    fitness = float(result.stdout.split(':')[1].strip())
    
    # Save results
    with open(results_path, 'w') as f:
        json.dump({'fitness': fitness}, f)

if __name__ == '__main__':
    evaluate(sys.argv[1], sys.argv[2])

4. Create config (my_problem/config.yaml):

SEED: 42

CODEBASE_PATH: "."
EVAL_FILE_NAME: "evaluate.py"

INIT_FILE_DATA:
  filename: "solution.py"
  language: "python"

SYS_MSG: |
  # PROMPT-BLOCK-START
  You are an expert optimization algorithm designer. Your goal is to modify
  the given code to maximize the objective function.
  # PROMPT-BLOCK-END

ENSEMBLE:
  - model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
    temp: 0.8
    weight: 1

SAMPLER_AUX_LM:
  model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
  temp: 0.7

EVOLVE_CONFIG:
  num_epochs: 50
  num_islands: 2
  migration_topology: "directed_ring"
  migration_interval: 10
  
  selection_policy: "tournament"
  selection_kwargs:
    tournament_size: 3
  
  num_inspirations: 2
  exploration_rate: 0.3
  
  fitness_key: "fitness"
  ckpt: 10
  early_stopping_rounds: 20

5. Run:

codeevolve \
  --inpt_dir=my_problem/input \
  --cfg_path=my_problem/config.yaml \
  --out_dir=my_problem/results \
  --terminal_logging

Customizing for Your Problem

CodeEvolve is designed for algorithmic problems with quantifiable metrics. To apply it to your domain:

  1. Define your evaluation function that measures solution quality

    • Must accept: code file path, results file path
    • Must output: JSON with at least one metric field
  2. Specify the initial codebase or problem structure

    • Mark evolution zones with # EVOLVE-BLOCK-START and # EVOLVE-BLOCK-END
    • Code outside these blocks is preserved
  3. Configure evolutionary parameters

    • Population size, mutation rates, selection policy
    • Exploration/exploitation balance
    • Migration topology and frequency
  4. Choose your LLM ensemble composition

    • Single model or multiple models
    • Separate ensembles for exploration vs exploitation
    • Weight distribution across models

See problems/problem_template for a general template. Comprehensive tutorials and example notebooks will be released soon.

Use Cases

The framework is suitable for any domain where solutions can be represented as code and evaluated programmatically:

Mathematical Discovery

  • Finding solutions to open problems in mathematics
  • Discovering new inequalities or bounds
  • Constructing optimal geometric configurations

Algorithm Design

  • Optimizing computational kernels and scheduling algorithms
  • Discovering novel heuristics for NP-hard problems
  • Automated algorithm configuration

Scientific Discovery

  • Exploring hypothesis spaces expressed as executable code
  • Parameter optimization for scientific models
  • Automated experimental design

Software Optimization

  • Performance tuning of critical code paths
  • Automatic parallelization strategies
  • Resource allocation optimization

Reproducing Research Results

For complete experimental configurations, benchmark implementations, and step-by-step examples demonstrating how to run CodeEvolve on various problems, visit our experiments repository:

github.com/inter-co/science-codeevolve-experiments

This companion repository contains all code necessary to reproduce the results from our technical report, including:

  • All benchmark problem implementations
  • Experimental configurations for each problem
  • Raw results and checkpoints from paper runs
  • Analysis notebooks with visualizations
  • Statistical comparisons with AlphaEvolve

Documentation

Configuration Reference

Key configuration parameters in your YAML file:

EVOLVE_CONFIG:
  # Basic settings
  num_epochs: 100              # Total iterations
  num_islands: 4               # Parallel populations
  init_pop: 20                 # Initial population per island
  max_size: 50                 # Max population size (null = unlimited)
  
  # Selection
  selection_policy: "tournament"  # or "roulette", "random", "best"
  selection_kwargs:
    tournament_size: 3
  
  # Exploration/Exploitation
  exploration_rate: 0.3        # Probability of exploration
  use_scheduler: true          # Use adaptive scheduling
  type: "ExponentialDecayScheduler"
  scheduler_kwargs:
    decay_rate: 0.995
  
  # Operators
  num_inspirations: 3          # Inspiration programs for crossover
  meta_prompting: true         # Enable prompt evolution
  max_chat_depth: 5            # Conversation history depth
  
  # Migration
  migration_topology: "directed_ring"  # Island topology
  migration_interval: 20       # Epochs between migrations
  migration_rate: 0.1          # Fraction to migrate
  
  # Quality-Diversity (optional)
  use_map_elites: false        # Enable MAP-Elites
  
  # Checkpointing
  ckpt: 10                     # Checkpoint frequency
  early_stopping_rounds: 50    # Stop after N epochs without improvement
  
  # Fitness
  fitness_key: "fitness"       # Metric name from evaluation JSON

Command-Line Reference

codeevolve [OPTIONS]

Required:
  --inpt_dir PATH          Input directory with solution and evaluator
  --out_dir PATH           Output directory for results

Optional:
  --cfg_path PATH          Config file (required for new runs)
  --load_ckpt INT          Checkpoint: 0=new, -1=latest, N=epoch N
  --terminal_logging       Show live progress in terminal

Output Structure

output_directory/
├── config.yaml              # Copy of configuration used
├── 0/                       # Island 0 results
│   ├── results.log          # Detailed execution log
│   ├── best_sol.py          # Best solution found
│   ├── best_prompt.txt      # Best prompt evolved
│   └── ckpt/                # Checkpoints
│       ├── ckpt_10.pkl
│       ├── ckpt_20.pkl
│       └── ...
├── 1/                       # Island 1 results
│   └── ...
└── ...

Releases

We organize the different versions of CodeEvolve as releases in both this repository and its companion experiments repository. Currently, we have the following releases:

  1. v0.1.0: Initial version of CodeEvolve, corresponds to v1 of technical report
  2. v0.2.0 / v0.2.1: Most recent release, corresponds to v3 of technical report with minor bug fixes

Contributing

We welcome contributions from the community! Here's how to get involved:

  1. Start with an issue: Browse existing issues or create a new one describing your proposed change

  2. Submit a pull request:

    • Fork the repository
    • Create a feature branch
    • Make your changes with tests
    • Reference the issue in your PR description
  3. Keep PRs focused: Avoid massive changes—smaller, well-tested contributions are easier to review

  4. Maintain quality: Ensure code is tested, documented, and follows existing style

Please refer to CONTRIBUTING.md for detailed guidelines.

Areas for Contribution

  • New selection policies or evolutionary operators
  • Additional LLM providers and integrations
  • Benchmark problems from your domain
  • Documentation improvements and tutorials
  • Performance optimizations
  • Bug fixes and test coverage

Citation

If you use CodeEvolve in your research, please cite our paper:

@article{assumpção2025codeevolveopensourceevolutionary,
      title={CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization},
      author={Henrique Assumpção and Diego Ferreira and Leandro Campos and Fabricio Murai},
      year={2025},
      eprint={2510.14150},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.14150},
}

Acknowledgements

The authors thank Bruno Grossi for his continuous support during the development of this project. We thank Fernando Augusto and Tiago Machado for useful conversations about possible applications of CodeEvolve. We also thank the OpenEvolve community for their inspiration and discussion about evolutionary coding agents.

License and Disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0.

This is not an official Inter product.

About

CodeEvolve is an open-source evolutionary coding agent for algorithm discovery and optimization.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.6%
  • Shell 0.4%