wmbench: An Open-Source Benchmarking Toolkit for Efficient LLM Watermark Generation and Detection at Scale.
This project is under active development. We are continuously integrating new watermarking schemes and evaluation metrics.
-
High Scalability: Designed with a modular architecture that allows for easy integration of custom watermarking schemes and evaluation metrics.
-
(TBD)Extensive Baseline Support: Planned support for a wide range of state-of-the-art (SOTA) watermarking algorithms (e.g., KGW, Unforgeable Watermarks).
-
Optimized Performance: High-efficiency parallel generation and detection, specifically tailored for large-scale benchmarking tasks.
- KGW: [Kirchenbauer et al., 2023] A Watermark for Large Language Models.
This project uses uv for extremely fast Python package and environment management.
Ensure you have uv installed.
Clone the repository and sync the dependencies:
# Create virtual environment and install dependencies automatically
uv syncThe primary entry point for large-scale evaluation is batch_benchmark.py. This script allows you to run watermarking generation and detection across various configurations.
To execute a batch benchmark task, use:
# Using 'uv run' to ensure the environment is correctly loaded
uv run batch_benchmark.py (Note: You can customize the parameters in batch_benchmark.py or via command-line arguments to suit your experimental setup.)
- Add support for more watermarking schemes.
- Integrate more detection metrics (e.g., ROC/AUC analysis).
- Comprehensive documentation for custom algorithm integration.
This project is inspired by MarkLLM.
If you find this project helpful, a star would be greatly appreciated!