Tool-Use Agents · Self-Evolution · Reinforcement Learning
Yihao Hu1,2, Zhihao Wen*,1, Xiujin Liu3, Pan Wang1,4, Xin Zhang1, Wei Wu1
*Corresponding Author · 1Ant Group · 2Westlake University · 3University of Michigan-Ann Arbor · 4University of Science and Technology of China
SEAL is a closed-loop co-evolution framework for interactive tool-use agents. It collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both training-time interface evolution and model-side policy optimization.
In SEAL, the agent reveals its capability gaps, the learning interface adapts around these failures, and the policy internalizes the resulting feedback through GRPO. Evaluation remains strict: tool semantics, task labels, and the verifier are unchanged.
The released SEAL-7B model is available on Hugging Face: mis2/SEAL-7B.
- Verifier-grounded diagnosis: executable traces are mapped to failure types such as invalid tool calls, argument mismatches, missing tool calls, recovery failures, and response mismatches.
- Training-time interface evolution: BFCL observations expose schema affordances and recovery-oriented feedback without changing the test-time environment.
- Diagnosis-guided optimization: diagnostic profiles reweight GRPO advantages while preserving the original verifier reward.
The steps below mirror the AgentEvolver environment build: a Python 3.11 training environment with CUDA/flash-attn, plus a separate Python 3.11.13 BFCL environment service.
- Linux server with Conda, Git, CUDA-capable GPUs, and network access.
- Activate your user Miniconda first, for example:
source ~/miniconda3/bin/activate. - Defaults:
trainer.n_gpus_per_node=4, BFCL service athttp://127.0.0.1:8082.
git clone https://github.com/yihaohu0118/SEAL.git
cd SEALRecommended:
source ~/miniconda3/bin/activate
bash install.sh
conda activate sealEquivalent manual commands:
conda create -n seal python=3.11 -y
conda activate seal
conda install -y -c nvidia cuda-toolkit
python -m pip install --upgrade pip wheel "setuptools==68.2.2"
pip install -r requirements.txt
pip install --verbose flash-attn==2.7.4.post1 ring-flash-attn --no-build-isolationIf the server has slow Hugging Face access, set a mirror before installing or before the first model download:
export HF_ENDPOINT=https://hf-mirror.comThis creates the bfcl Conda environment, clones the official Gorilla/BFCL
repository, installs bfcl_eval, and generates the processed multi-turn BFCL
data used by SEAL.
source ~/miniconda3/bin/activate
bash env_service/environments/bfcl/setup.shVerify the BFCL setup:
conda run -n bfcl python -c "import bfcl_eval; print('bfcl_eval ok')"
test -f env_service/environments/bfcl/bfcl_data/multi_turn_processed.jsonl
test -f data/bfcl_400_split.jsonIf bfcl_eval imports but multi_turn_processed.jsonl is missing, the data
preprocessing step failed. Install the missing dependency into bfcl, not
base, then rerun setup:
conda run -n bfcl pip install -r env_service/environments/bfcl/requirements.txt
bash env_service/environments/bfcl/setup.shIn terminal 1:
source ~/miniconda3/bin/activate
conda activate bfcl
export BFCL_HOST=127.0.0.1
export BFCL_PORT=8082
bash env_service/launch_script/bfcl.shKeep this process running. In another terminal, check that the service is up:
curl -fsS http://127.0.0.1:8082/healthzIn terminal 2:
source ~/miniconda3/bin/activate
cd SEAL
conda activate seal
python launcher.py --conf exp/SEAL.yamlUseful server-specific overrides:
python launcher.py --conf exp/SEAL.yaml -- \
trainer.n_gpus_per_node=8 \
actor_rollout_ref.model.path=/path/to/Qwen2.5-7B-Instruct \
env_service.env_url=http://127.0.0.1:8082The launcher writes a config/code backup to launcher_record/SEAL/, and
training outputs are written under experiments/SEAL/.
Validation generations are written as JSONL files under
experiments/tech_synthetic/<experiment_name>/validation_log. Summarize
per-category BFCL pass rates with:
TEST_PARQUET=data/bfcl_eval_400.parquet
BFCL_JSONL=env_service/environments/bfcl/bfcl_data/multi_turn_processed.jsonl
python3 scripts/stats_validation_bfcl.py \
--val-dir experiments/tech_synthetic/SEAL/validation_log \
--parquet "${TEST_PARQUET}" \
--bfcl-jsonl "${BFCL_JSONL}"If you want launcher.py to start BFCL automatically, configure .env first:
cp example.env .envEdit BFCL_SCRIPT in .env so it uses your actual Conda base path:
BFCL_SCRIPT="source /path/to/miniconda/bin/activate bfcl; bash bfcl.sh"Then run:
source ~/miniconda3/bin/activate
cd SEAL
conda activate seal
python launcher.py --conf exp/SEAL.yaml --with-bfclDASHSCOPE_API_KEY is not required for the released SEAL recipe because
task_manager.n=0 and synthetic_data_ratio=0.0. Set it only if you enable
synthetic task exploration.
exp/SEAL.yaml Full closed-loop SEAL co-evolution recipe
env_service/environments/bfcl/ Executable verifier and train-time interface evolution layer
agentevolver/module/tocf/ Failure diagnosis, capability-state tracking, and A-Patch advantage reweighting
agentevolver/module/task_manager/ BFCL task adaptation, reward routing, and diagnostic dense grader
data/ Low-resource BFCL train/evaluation splits used by SEAL
SEAL builds on ideas and infrastructure from modelscope/AgentEvolver and verl-project/verl. We sincerely thank the authors and contributors of these projects.
@article{hu2026seal,
title={SEAL: Synergistic Co-Evolution of Agents and Learning Environments},
author={Hu, Yihao and Wen, Zhihao and Liu, Xiujin and Wang, Pan and Zhang, Xin and Wu, Wei},
journal={arXiv preprint arXiv:2605.24426},
year={2026},
url={https://arxiv.org/abs/2605.24426}
}This project is released under the Apache License 2.0.

