GitHub - zwhe99/X-SIR: [ACL 2024] Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

💧 X-SIR: A text watermark that survives translation

Implementaion of our paper:

Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

🔥 News

[Apr 8, 2024]: New repo released!

Conda environment

Tested on the following environment, but it should work on other versions.

python 3.10.10
pytorch
pip3 install -r requirements.txt
[optional] pip3 install flash-attn==2.3.3

Overview

src_watermark implements three text watermarking methods (x-sir, sir and kgw) with a unified interface.
attack contains two watermarking removal methods: paraphrase and translation
Scripts:
- gen.py: generate text with watermark
- detect.py: compute z-score for given texts
- eval_detection.py: calculate AUC, TPR, and F1 for watermark detection
- You can use --help to see full usage of these scripts.
Supported models:
- meta-llama/Llama-2-7b-hf
- baichuan-inc/Baichuan2-7B-Base
- baichuan-inc/Baichuan-7B
- mistralai/Mistral-7B-v0.1
Supported languages: English (En), German (De), French (Fr), Chinese (Zh), Japanese (Ja)
You can learn how to extend model and language in from-scratch.md.

Usage (No attack)

Generate text with watermark

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/xsir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method xsir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

python3 gen.py \
    --base_model $MODEL_NAME \
    --fp16 \
    --batch_size 32 \
    --input_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --WATERMARK_METHOD_FLAG

Compute the z-scores

# Compute z-score for human-written text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

# Compute z-score for watermarked text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

Evaluation

python3 eval_detection.py \
	--hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
	--wm_zscore gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl

AUC: 0.994

TPR@FPR=0.1: 0.994
TPR@FPR=0.01: 0.862

F1@FPR=0.1: 0.955
F1@FPR=0.01: 0.921

Usage (With attack)

Here we test the watermark after translating to other languages (De, Fr, Zh, Ja).

Preparation

We use ChatGPT to perform paraphrase and translation. Therefore:

Set you openai api key: export OPENAI_API_KEY=xxxx
You may also want to modify the RPMs and TPMs in attack/const.py

Translation

TGT_LANGS=("de" "fr" "zh" "ja")
for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 attack/translate.py \
        --input_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --model gpt-3.5-turbo-1106 \
        --src_lang en \
        --tgt_lang $TGT_LANG
done

Compute the z-scores

for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 detect.py \
        --base_model $MODEL_NAME \
        --detect_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl \
        $WATERMARK_METHOD_FLAG
done

Evaluation

for TGT_LANG in "${TGT_LANGS[@]}"; do
    echo "En->$TGT_LANG"
    python3 eval_detection.py \
        --hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
        --wm_zscore gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl
done

En->de
AUC: 0.769

TPR@FPR=0.1: 0.318
TPR@FPR=0.01: 0.060

F1@FPR=0.1: 0.450
F1@FPR=0.01: 0.112

En->fr
AUC: 0.810

TPR@FPR=0.1: 0.354
TPR@FPR=0.01: 0.046

F1@FPR=0.1: 0.488
F1@FPR=0.01: 0.087

En->zh
AUC: 0.905

TPR@FPR=0.1: 0.702
TPR@FPR=0.01: 0.182

F1@FPR=0.1: 0.781
F1@FPR=0.01: 0.305

En->ja
AUC: 0.911

TPR@FPR=0.1: 0.696
TPR@FPR=0.01: 0.112

F1@FPR=0.1: 0.775
F1@FPR=0.01: 0.200

Usage (With attack)

You can use the following flags to specify the watermarking method:

KGW

WATERMARK_METHOD_FLAG="--watermark_method kgw"

SIR

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/sir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method sir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

Acknowledgement

This work can not be done without the help of the following repos:

Citation

@article{he2024can,
  title={Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models},
  author={He, Zhiwei and Zhou, Binglin and Hao, Hongkun and Liu, Aiwei and Wang, Xing and Tu, Zhaopeng and Zhang, Zhuosheng and Wang, Rui},
  journal={arXiv preprint arXiv:2402.14007},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💧 X-SIR: A text watermark that survives translation

Usage (No attack)

Usage (With attack)

Preparation

Usage (With attack)

Acknowledgement

Citation

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assert		assert
attack		attack
data		data
gen		gen
scripts		scripts
src_watermark		src_watermark
.gitignore		.gitignore
README.md		README.md
detect.py		detect.py
eval_detection.py		eval_detection.py
from-scratch.md		from-scratch.md
gen.py		gen.py
requirements.txt		requirements.txt
utils.py		utils.py

zwhe99/X-SIR

Folders and files

Latest commit

History

Repository files navigation

💧 X-SIR: A text watermark that survives translation

Usage (No attack)

Usage (With attack)

Preparation

Usage (With attack)

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages