Skip to content

uw-nsl/ArtPrompt

Repository files navigation

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Fengqing Jiang1,* ,  Zhangchen Xu1,* ,  Luyao Niu1,* , 
Zhen Xiang2 ,  Bhaskar Ramasubramanian3 , 
Bo Li4 ,  Radha Poovendran1  

1University of Washington   2University of Illinois Urbana-Champaign   
3Western Washington University   4University of Chicago   
*Equal Contribution

Warning: This project contains model outputs that may be considered offensive

[arXiv]

Overview

How to Use ArtPrompt

Quick Start

We provide a demo prompt to show the effectiveness of ArtPrompt in notebook demo.ipynb (also at demo_prompt.txt). This is a successful prompt toward gpt-4-0613.

Run with ArtPrompt

Setup Environment

  • Make sure setup your API key in utils/model.py (or in environment) before running experiment.

Running

Run evaluation on vitc-s dataset. More details please refer to benchmark.py

# at dir ArtPrompt
python benchmark.py --model gpt-4-0613 --task s

Run jailbreak with ArtPrompt. More details please refer to baseline.py

cd jailbreak
python baseline.py --model gpt-4-0613 --tmodel gpt-3.5-turbo-0613 

You could use --mp arg to accelerate the inference time based on the available cpu cores on your machine.

Acknowledgement

Our project built upon the work from python-art,llm-attack, AutoDan, PAIR, DeepInception, LLM-Finetuning-Safety, BPE-Dropout. We appreciated these open-sourced work in the community.

Citation

If you find our project useful in your research, please consider citing:

@misc{jiang2024artprompt,
      title={ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs}, 
      author={Fengqing Jiang and Zhangchen Xu and Luyao Niu and Zhen Xiang and Bhaskar Ramasubramanian and Bo Li and Radha Poovendran},
      year={2024},
      eprint={2402.11753},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Official Implementation of ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published