Evolutionary Multi-Agent Workflow Optimization for LLMs
EvoLite is a framework that uses evolutionary algorithms to automatically discover and optimize multi-agent workflows for code generation tasks. It balances performance (Pass@1) and efficiency (token cost) using multi-objective optimization (NSGA-II).
- 𧬠Evolutionary Workflow Optimization: Uses genetic algorithms to evolve agent workflow topologies
- π― Multi-Objective Optimization: NSGA-II based Pareto optimization for Pass@1 vs Token Cost trade-off
- π€ LLM-Driven Evolution: Semantic mutation and crossover using LLM for intelligent topology modifications
- π Benchmark Support: MBPP and MATH algebra benchmarks for evaluation
- π Flexible Workflow Design: Linear chains, loops (reflexion), branching (plan-and-solve), and test-driven patterns
- β‘ High-Performance Evaluation: Async batch evaluation with configurable concurrency
EvoLite/
βββ src/
β βββ agents/ # Agent and Workflow definitions
β β βββ agent.py # Base Agent class
β β βββ workflow.py # Workflow orchestration using LangGraph
β β βββ block.py # Block-based agent abstraction
β β βββ extractors.py # Answer extraction agents
β βββ ga/ # Genetic Algorithm implementations
β β βββ ga_llm.py # LLM-enhanced GA (main algorithm)
β β βββ ga.py # Basic GA implementation
β β βββ hdlo.py # Hierarchical Design-Level Optimization
β β βββ multi_objective.py # NSGA-II utilities
β βββ datasets/ # Benchmark dataset loaders
β β βββ mbpp.py # MBPP dataset
β β βββ math_algebra.py # MATH algebra dataset
β βββ evaluation/ # Evaluation utilities
β βββ llm/ # LLM client implementations
β βββ client/ # Evaluation server client
β βββ server/ # FastAPI evaluation server
βββ configs/ # Configuration files
βββ scripts/ # Baseline scripts
β βββ mbpp_baseline.py # MBPP baseline solver
β βββ math_baseline.py # MATH baseline solver
βββ tests/ # Test scripts
βββ evaluate.py # Single workflow evaluation
Workflows are represented as directed graphs using arrow syntax:
Planner -> Coder -> Reviewer -> Coder # Reflexion loop
Planner -> CoderA, Planner -> CoderB -> Merger # Branching
- Mutation: Add/remove agents, rewire connections
- Semantic mutation: LLM suggests improvements
- Agnostic mutation: Random structural changes
- Crossover: Combine topologies from two parent workflows
- Distillation: Transplant modules between workflows
- Mixing: Merge parallel structures
Each workflow is evaluated on:
- Pass@1: Code correctness on benchmark problems
- Token Cost: Total tokens used for inference
- Non-dominated sorting to identify Pareto fronts
- Crowding distance for diversity preservation
- Elitism with buffer (probation) mechanism
git clone https://github.com/tejava317/EvoLite.git
cd EvoLiteconda env create -f environment.yml
conda activate evolitecp .env.example .envEdit .env and configure your API:
# Option 1: OpenAI API
OPENAI_API_KEY=your_openai_api_key
# Option 2: Local vLLM server (OpenAI-compatible)
VLLM_BASE_URL=http://localhost:8000/v1# Run GA-LLM on MBPP dataset
python -m src.ga.ga_llm \
--task MBPP \
--population-size 100 \
--generation 15 \
--num-problem 30 \
--server-url http://localhost:8001
# Run without LLM (random mutation only)
python -m src.ga.ga_llm --task MBPP --no-llm --fastKey Arguments:
| Argument | Default | Description |
|---|---|---|
--population-size |
100 | Number of workflows per generation |
--generation |
10 | Number of evolutionary generations |
--num-problem |
30 | Problems for fitness evaluation |
--elite-ratio |
0.2 | Fraction of population preserved |
--buffer-size |
10 | Probation pool size |
--max-eval-iter |
4 | Max evaluations per individual |
--no-llm |
- | Disable LLM-based operators |
--fast |
- | Use random fitness (for testing) |
# Evaluate default workflow on MBPP
python evaluate.py --task MBPP --num-problems 50
# Custom workflow
python evaluate.py --task MBPP \
--roles "Task Parsing Agent,Code Generation Agent,Code Reviewer Agent" \
--show-intermediate# Single-agent baseline on MBPP
python scripts/mbpp_baseline.py --num_problems 100 --model "Qwen/Qwen3-4B"# Start FastAPI server for batch evaluation
uvicorn src.server.app:app --host 0.0.0.0 --port 8001Evolution checkpoints are saved to src/ga/ga_llm_checkpoints/:
population_<run_id>_gen0.csv
population_<run_id>_gen1.csv
...
population_<run_id>_final.csv
Visualization of Pareto fronts saved to src/ga/ga_llm_graph/:
<run_id>_gen0.png
<run_id>_gen1.png
...
<run_id>_final.png
Define custom agent roles in configs/generated_prompts.yaml:
agents:
Task Parsing Agent:
prompt: "You are a task parsing agent..."
Code Generation Agent:
prompt: "You are a code generation agent..."Configure benchmark tasks in configs/task_descriptions.yaml:
MBPP:
description: "Python programming problems from MBPP..."
MATH:
description: "Algebra problems from MATH dataset..."| Parameter | Value | Description |
|---|---|---|
POPULATION_SIZE |
100 | Individuals per generation |
MUTATION_RATE |
0.7 | Probability of mutation |
CROSSOVER_RATE |
0.3 | Probability of crossover |
AGNOSTIC_RATIO |
0.2 | Random vs semantic mutation |
LLM_CALL_BUDGET |
500 | Max LLM calls for evolution |
- Linear Chain:
A -> B -> C - Reflexion Loop:
Solver -> Reviewer -> Solver - Branching:
Planner -> [WorkerA, WorkerB] -> Merger - Test-Driven:
TestGen -> CodeGen -> Verify
MIT License
If you use EvoLite in your research, please cite:
@software{evolite2024,
title = {EvoLite: Evolutionary Multi-Agent Workflow Optimization},
author = {EvoLite Team},
year = {2025},
url = {https://github.com/tejava317/EvoLite}
}