A system that integrates human expert context with an evolutionary LLM agent.
- Project Website: https://quanquancliu.com/ParEVO/index.html
- Paper: ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
If you use ParEVO or the Parlay-Instruct Corpus in your research, please use the following citation:
@inproceedings{yang2026parevo,
title={ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution},
author={Yang, Liu and Nie, Zeyu and Liu, Andrew and Zou, Felix and Altinb{\"u}ken, Deniz and Yazdanbakhsh, Amir and Liu, Quanquan C.},
booktitle={arXiv Preprint},
year={2026}
}We use the PBBS Benchmarks (https://cmuparlay.github.io/pbbsbench/), where for each benchmark the suite provides:
- the specification of the input and expected output for the problem,
- the specification of a default set of input instances,
- code for generating inputs (written to a file),
- code for checking correctness of output (read from a file),
- code for timing the benchmark across the instances,
- a default parallel implementation,
- a default sequential implementation (for most benchmarks),
- a variety of other implementations (for some benchmarks).
Create a Conda environment and install packages.
conda create -n gemini-env python=3.12
conda install pytorch torchvision torchaudio -c pytorch
conda install transformers
conda install -c conda-forge sentencepiece- Navigate to
pbbsbench/benchmarks/breadthFirstSearch/deterministicBFS, runmaketo generate the executable of the driver code. Note that the driver code are theBFSCheck.CandBFSTime.Cinpbbsbench/benchmarks/breadthFirstSearch/bench. If you need openmp, run withmake OPENMP=1. - Test it with command line in the form of
<bench> [-o <outfile>] [-r <numrounds>] [-scale <num_of_thread>] <infile>- Check the correctness: navigate to
pbbsbench/benchmarks/breadthFirstSearch/bench, run
<checker> <input> <output>- Within each implementation directory, you can run also run
./testInputs. On a machine with multiple chips, usingumactl -i all ./testInputswill give better results../testInputs_smallwill use the smaller inputs. The testInputs script has several options including:
-x : do not check the output
-r <count> : number of rounds to use
-p <count> : number of threads to useThe actual inputs are specified in the script and can be changed if desired. So we can probably change our implementation of BFS, make and run ./testInputs -r <count> -p <count> >> <output_log> to write the evaluation results to the output_log. An example of the output:
randLocalGraph_J_10_20000000 : -r 3 -o /tmp/ofile10284_857347 : '2.257', '2.255', '2.257', geomean = 2.256
rMatGraph_J_12_16000000 : -r 3 -o /tmp/ofile859821_410881 : '2.476', '2.462', '2.467', geomean = 2.468conda create --name huggingface python=3.11.* transformers accelerate tokenizers datasets jupyter jupyterlabWe added parlay and cilk to the raw prompts by create-parlay-raw-prompts.py (which leverages gemini API to translate the OpenMP prompt and function definition to that of Parlaylib) and update-prompts-concise.py (which traverses the raw directory and replaces each OpenMP with OpenCilk). Then, we can run gather-raw-prompts.py to genterate generation-prompts.json.
- Gemini: Use
generate-gemini.py - Local Models with vLLM: Use
generate-vllm.py