Skip to content

uiuc-arc/llm-code-watermark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Is Watermarking LLM-Generated Code Robust?

ℹ️ About | 📖 Watermarked Program Transformation | 🚀 Quick Start

ℹ️ About

Official implementation of "Is Watermarking LLM-Generated Code Robust?".

In the paper, we present the first study of the robustness of existing watermarking techniques on Python code generated by large language models. Although existing works showed that watermarking can be robust for natural language, we show that it is easy to remove these watermarks on code by semantic-preserving transformations. We propose an algorithm that walks the Abstract Syntax Tree (AST) of the watermarked code and randomly applies semantic-preserving program modifications. We observe significantly lower true-positive rate (TPR) of detection even under simple modifications, underscoring the need for robust LLM watermarks tailored specifically for code.

This repository contains code for:

🌊 Watermarking LLM-generated code
💯 Evaluating functional correctness of watermarked code on the HumanEval dataset
✏️ Applying realistic semantic-preserving transformations to the watermarked code

If you find this repository useful, please cite our paper:

@misc{suresh2024watermarking,
      title={Is Watermarking LLM-Generated Code Robust?}, 
      author={Tarun Suresh and Shubham Ugare and Gagandeep Singh and Sasa Misailovic},
      year={2024},
      eprint={2403.17983},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

📖 Watermarked Program Transformation

In practice, a user may modify watermarked LLM-generated code to better integrate within a larger program or evade detection. We consider that the user has only black-box input-output access to model and has no knowledge of the watermarking algorithm. The user can apply a series of semantic-preserving transformations, e.g., inserting print statements or renaming variables, to modify the code. We replicate these program modifications in an algorithm that:

  1. Takes the watermarked code and the number of transformations to apply as input.
  2. Parses the watermarked code to obtain the AST representation of the code.
  3. Randomly selects a transformation from a set of transformations to apply.
  4. Traverses the AST to determine the set of all possible insertion, deletion, or substitution locations for that transformation.
  5. Transforms the AST at a randomly selected subtree by replacing the sequence of terminals with a "hole" and then completing it with a randomly syntactically-valid sequence.

We implement the following semantic-preserving transformations:

Transformation Implemented
Replace True False
Rename Variables
Insert Print Statements
Wrap With Try Catch
Remove Comments
Unroll While Loops
Add Dead Code

🚀 Quick Start

Installation Instructions

Install human-eval: https://github.com/openai/human-eval

Install llama: https://github.com/facebookresearch/llama

Model checkpoint location is hardcoded to: /share/models/llama_model/llama/ for now

The original LM Watermarking implementation is enabled by the huggingface/transformers 🤗 library. To convert the Llama model weights to the Hugging Face Transformers format, run the following script:

python lpw/convert_llama_weights_to_hf.py \
    --input_dir /share/models/llama_model/llama/ --model_size 13B --output_dir /share/models/llama_model/hf/13B/

Thereafter, models can be loaded via:

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("/share/models/llama_model/hf/13B/")
tokenizer = LlamaTokenizer.from_pretrained("/share/models/llama_model/hf/13B/")

Running Watermarking and Applying Program Transformations

Watermarking can be run for any Language Model supported by HuggingFace. Currently, the watermarking algorithms supported are UMD, SWEET, Unigram, and RobDist. The following command runs watermarking with the UMD algorithm on Llama-7B generating Python completions to problems in the HumanEval dataset:

python lpw/run_watermark.py \
    --model_name_or_path /share/models/llama_model/hf/Llama-7b --language python --dataset multi-humaneval

Watermarking results are written to results/watermarking/ Thereafter, one can apply any of the aforementioned semantic preservations or a combination of them to the watermarked code. The following command inserts 5 print statements into the watermarked code:

python lpw/perturb_watermark.py \
    --model_name_or_path /share/models/llama_model/hf/Llama-7b --language python --dataset multi-humaneval --perturbation_ids 5 --depths 5 

Arguments

A full list of arguments can be found here.

Releases

No releases published

Packages

No packages published