wmbench: An Open-Source Benchmarking Toolkit for Efficient LLM Watermark Generation and Detection at Scale.

🚧 Work in Progress

This project is under active development. We are continuously integrating new watermarking schemes and evaluation metrics.

💡 Key Features

High Scalability: Designed with a modular architecture that allows for easy integration of custom watermarking schemes and evaluation metrics.
(TBD)Extensive Baseline Support: Planned support for a wide range of state-of-the-art (SOTA) watermarking algorithms (e.g., KGW, Unforgeable Watermarks).
Optimized Performance: High-efficiency parallel generation and detection, specifically tailored for large-scale benchmarking tasks.

🛠️ Currently Supported Algorithms

KGW: [Kirchenbauer et al., 2023] A Watermark for Large Language Models.

🚀 How to Use

This project uses uv for extremely fast Python package and environment management.

1. Prerequisites

Ensure you have uv installed.

2. Environment Setup

Clone the repository and sync the dependencies:

# Create virtual environment and install dependencies automatically
uv sync

3. Running Benchmarks

The primary entry point for large-scale evaluation is batch_benchmark.py. This script allows you to run watermarking generation and detection across various configurations.

To execute a batch benchmark task, use:

# Using 'uv run' to ensure the environment is correctly loaded
uv run batch_benchmark.py

(Note: You can customize the parameters in batch_benchmark.py or via command-line arguments to suit your experimental setup.)

📈 Roadmap

Add support for more watermarking schemes.
Integrate more detection metrics (e.g., ROC/AUC analysis).
Comprehensive documentation for custom algorithm integration.

🙏 Acknowledgements

This project is inspired by MarkLLM.

If you find this project helpful, a star would be greatly appreciated!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data_structure		data_structure
pipelines		pipelines
utils		utils
watermarks		watermarks
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
batch_benchmark.py		batch_benchmark.py
gen_pipe_test.py		gen_pipe_test.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wmbench: An Open-Source Benchmarking Toolkit for Efficient LLM Watermark Generation and Detection at Scale.

🚧 Work in Progress

💡 Key Features

🛠️ Currently Supported Algorithms

🚀 How to Use

1. Prerequisites

2. Environment Setup

3. Running Benchmarks

📈 Roadmap

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wmbench: An Open-Source Benchmarking Toolkit for Efficient LLM Watermark Generation and Detection at Scale.

🚧 Work in Progress

💡 Key Features

🛠️ Currently Supported Algorithms

🚀 How to Use

1. Prerequisites

2. Environment Setup

3. Running Benchmarks

📈 Roadmap

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages