Is Watermarking LLM-Generated Code Robust?

Paper

ℹ️ About | 📖 Watermarked Program Transformation | 🚀 Quick Start

ℹ️ About

Official implementation of "Is Watermarking LLM-Generated Code Robust?".

In the paper, we present the first study of the robustness of existing watermarking techniques on Python code generated by large language models. Although existing works showed that watermarking can be robust for natural language, we show that it is easy to remove these watermarks on code by semantic-preserving transformations. We propose an algorithm that walks the Abstract Syntax Tree (AST) of the watermarked code and randomly applies semantic-preserving program modifications. We observe significantly lower true-positive rate (TPR) of detection even under simple modifications, underscoring the need for robust LLM watermarks tailored specifically for code.

This repository contains code for:


🌊 Watermarking LLM-generated code
💯 Evaluating functional correctness of watermarked code on the HumanEval dataset
✏️ Applying realistic semantic-preserving transformations to the watermarked code

If you find this repository useful, please cite our paper:

@misc{suresh2024watermarking,
      title={Is Watermarking LLM-Generated Code Robust?}, 
      author={Tarun Suresh and Shubham Ugare and Gagandeep Singh and Sasa Misailovic},
      year={2024},
      eprint={2403.17983},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

📖 Watermarked Program Transformation

In practice, a user may modify watermarked LLM-generated code to better integrate within a larger program or evade detection. We consider that the user has only black-box input-output access to model and has no knowledge of the watermarking algorithm. The user can apply a series of semantic-preserving transformations, e.g., inserting print statements or renaming variables, to modify the code. We replicate these program modifications in an algorithm that:

Takes the watermarked code and the number of transformations to apply as input.
Parses the watermarked code to obtain the AST representation of the code.
Randomly selects a transformation from a set of transformations to apply.
Traverses the AST to determine the set of all possible insertion, deletion, or substitution locations for that transformation.
Transforms the AST at a randomly selected subtree by replacing the sequence of terminals with a "hole" and then completing it with a randomly syntactically-valid sequence.

We implement the following semantic-preserving transformations:

Transformation	Implemented
`Replace True False`	✅
`Rename Variables`	✅
`Insert Print Statements`	✅
`Wrap With Try Catch`	✅
`Remove Comments`	✅
`Unroll While Loops`	✅
`Add Dead Code`	✅

🚀 Quick Start

Installation Instructions

Install human-eval: https://github.com/openai/human-eval

Install llama: https://github.com/facebookresearch/llama

Model checkpoint location is hardcoded to: /share/models/llama_model/llama/ for now

The original LM Watermarking implementation is enabled by the huggingface/transformers 🤗 library. To convert the Llama model weights to the Hugging Face Transformers format, run the following script:

python lpw/convert_llama_weights_to_hf.py \
    --input_dir /share/models/llama_model/llama/ --model_size 13B --output_dir /share/models/llama_model/hf/13B/

Thereafter, models can be loaded via:

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("/share/models/llama_model/hf/13B/")
tokenizer = LlamaTokenizer.from_pretrained("/share/models/llama_model/hf/13B/")

Running Watermarking and Applying Program Transformations

Watermarking can be run for any Language Model supported by HuggingFace. Currently, the watermarking algorithms supported are UMD, SWEET, Unigram, and RobDist. The following command runs watermarking with the UMD algorithm on Llama-7B generating Python completions to problems in the HumanEval dataset:

python lpw/run_watermark.py \
    --model_name_or_path /share/models/llama_model/hf/Llama-7b --language python --dataset multi-humaneval

Watermarking results are written to results/watermarking/ Thereafter, one can apply any of the aforementioned semantic preservations or a combination of them to the watermarked code. The following command inserts 5 print statements into the watermarked code:

python lpw/perturb_watermark.py \
    --model_name_or_path /share/models/llama_model/hf/Llama-7b --language python --dataset multi-humaneval --perturbation_ids 5 --depths 5

Arguments

A full list of arguments can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
lpw		lpw
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lpw

lpw

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Is Watermarking LLM-Generated Code Robust?

Paper

ℹ️ About

📖 Watermarked Program Transformation

🚀 Quick Start

Installation Instructions

Running Watermarking and Applying Program Transformations

Arguments

About

Releases

Packages

Contributors 2

Languages

uiuc-arc/llm-code-watermark

Folders and files

Latest commit

History

Repository files navigation

Is Watermarking LLM-Generated Code Robust?

ℹ️ About

📖 Watermarked Program Transformation

🚀 Quick Start

Installation Instructions

Running Watermarking and Applying Program Transformations

Arguments

About

Resources

Stars

Watchers

Forks

Languages