Skip to content

yihaohu0118/SEAL

Repository files navigation

SEAL: Synergistic Co-Evolution of Agents and Learning Environments

Homepage Paper Model License

Tool-Use Agents · Self-Evolution · Reinforcement Learning

Yihao Hu1,2, Zhihao Wen*,1, Xiujin Liu3, Pan Wang1,4, Xin Zhang1, Wei Wu1

*Corresponding Author · 1Ant Group · 2Westlake University · 3University of Michigan-Ann Arbor · 4University of Science and Technology of China

SEAL overview

Overview

SEAL is a closed-loop co-evolution framework for interactive tool-use agents. It collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both training-time interface evolution and model-side policy optimization.

In SEAL, the agent reveals its capability gaps, the learning interface adapts around these failures, and the policy internalizes the resulting feedback through GRPO. Evaluation remains strict: tool semantics, task labels, and the verifier are unchanged.

The released SEAL-7B model is available on Hugging Face: mis2/SEAL-7B.

SEAL closed-loop co-evolution framework

Highlights

  • Verifier-grounded diagnosis: executable traces are mapped to failure types such as invalid tool calls, argument mismatches, missing tool calls, recovery failures, and response mismatches.
  • Training-time interface evolution: BFCL observations expose schema affordances and recovery-oriented feedback without changing the test-time environment.
  • Diagnosis-guided optimization: diagnostic profiles reweight GRPO advantages while preserving the original verifier reward.

Reproducible Server Setup

The steps below mirror the AgentEvolver environment build: a Python 3.11 training environment with CUDA/flash-attn, plus a separate Python 3.11.13 BFCL environment service.

0. Prerequisites

  • Linux server with Conda, Git, CUDA-capable GPUs, and network access.
  • Activate your user Miniconda first, for example: source ~/miniconda3/bin/activate.
  • Defaults: trainer.n_gpus_per_node=4, BFCL service at http://127.0.0.1:8082.

1. Clone the repository

git clone https://github.com/yihaohu0118/SEAL.git
cd SEAL

2. Build the SEAL training environment

Recommended:

source ~/miniconda3/bin/activate
bash install.sh
conda activate seal

Equivalent manual commands:

conda create -n seal python=3.11 -y
conda activate seal
conda install -y -c nvidia cuda-toolkit
python -m pip install --upgrade pip wheel "setuptools==68.2.2"
pip install -r requirements.txt
pip install --verbose flash-attn==2.7.4.post1 ring-flash-attn --no-build-isolation

If the server has slow Hugging Face access, set a mirror before installing or before the first model download:

export HF_ENDPOINT=https://hf-mirror.com

3. Build the BFCL environment service

This creates the bfcl Conda environment, clones the official Gorilla/BFCL repository, installs bfcl_eval, and generates the processed multi-turn BFCL data used by SEAL.

source ~/miniconda3/bin/activate
bash env_service/environments/bfcl/setup.sh

Verify the BFCL setup:

conda run -n bfcl python -c "import bfcl_eval; print('bfcl_eval ok')"
test -f env_service/environments/bfcl/bfcl_data/multi_turn_processed.jsonl
test -f data/bfcl_400_split.json

If bfcl_eval imports but multi_turn_processed.jsonl is missing, the data preprocessing step failed. Install the missing dependency into bfcl, not base, then rerun setup:

conda run -n bfcl pip install -r env_service/environments/bfcl/requirements.txt
bash env_service/environments/bfcl/setup.sh

4. Start the BFCL service

In terminal 1:

source ~/miniconda3/bin/activate
conda activate bfcl
export BFCL_HOST=127.0.0.1
export BFCL_PORT=8082
bash env_service/launch_script/bfcl.sh

Keep this process running. In another terminal, check that the service is up:

curl -fsS http://127.0.0.1:8082/healthz

5. Run SEAL training

In terminal 2:

source ~/miniconda3/bin/activate
cd SEAL
conda activate seal
python launcher.py --conf exp/SEAL.yaml

Useful server-specific overrides:

python launcher.py --conf exp/SEAL.yaml -- \
  trainer.n_gpus_per_node=8 \
  actor_rollout_ref.model.path=/path/to/Qwen2.5-7B-Instruct \
  env_service.env_url=http://127.0.0.1:8082

The launcher writes a config/code backup to launcher_record/SEAL/, and training outputs are written under experiments/SEAL/.

6. View BFCL validation results

Validation generations are written as JSONL files under experiments/tech_synthetic/<experiment_name>/validation_log. Summarize per-category BFCL pass rates with:

TEST_PARQUET=data/bfcl_eval_400.parquet
BFCL_JSONL=env_service/environments/bfcl/bfcl_data/multi_turn_processed.jsonl

python3 scripts/stats_validation_bfcl.py \
  --val-dir experiments/tech_synthetic/SEAL/validation_log \
  --parquet "${TEST_PARQUET}" \
  --bfcl-jsonl "${BFCL_JSONL}"

Optional: let the launcher start BFCL

If you want launcher.py to start BFCL automatically, configure .env first:

cp example.env .env

Edit BFCL_SCRIPT in .env so it uses your actual Conda base path:

BFCL_SCRIPT="source /path/to/miniconda/bin/activate bfcl; bash bfcl.sh"

Then run:

source ~/miniconda3/bin/activate
cd SEAL
conda activate seal
python launcher.py --conf exp/SEAL.yaml --with-bfcl

DASHSCOPE_API_KEY is not required for the released SEAL recipe because task_manager.n=0 and synthetic_data_ratio=0.0. Set it only if you enable synthetic task exploration.

Repository Layout

exp/SEAL.yaml                         Full closed-loop SEAL co-evolution recipe
env_service/environments/bfcl/        Executable verifier and train-time interface evolution layer
agentevolver/module/tocf/             Failure diagnosis, capability-state tracking, and A-Patch advantage reweighting
agentevolver/module/task_manager/     BFCL task adaptation, reward routing, and diagnostic dense grader
data/                                 Low-resource BFCL train/evaluation splits used by SEAL

Acknowledgements

SEAL builds on ideas and infrastructure from modelscope/AgentEvolver and verl-project/verl. We sincerely thank the authors and contributors of these projects.

Citation

@article{hu2026seal,
  title={SEAL: Synergistic Co-Evolution of Agents and Learning Environments},
  author={Hu, Yihao and Wen, Zhihao and Liu, Xiujin and Wang, Pan and Zhang, Xin and Wu, Wei},
  journal={arXiv preprint arXiv:2605.24426},
  year={2026},
  url={https://arxiv.org/abs/2605.24426}
}

License

This project is released under the Apache License 2.0.

About

The source code for SEAL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors