Skip to content

TrustAIRLab/VoiceJailbreakAttack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Jailbreak Attacks Against GPT-4o

arXiv: paper license: MIT

Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended.

This is the official repository for Voice Jailbreak Attacks Against GPT-4o. In this paper, we present the first study on how to jailbreak GPT-4o with voice.

Check out our demo below!

rob_a_bank_480p.MOV

Code

  1. Set your OpenAI key
echo "export OPENAI_API_KEY='YOURKEY'" >> ~/.zshrc
source ~/.zshrc
echo $OPENAI_API_KEY # check your key
  1. Convert forbidden questions to audio files
python tts/prompt2audio.py --dataset baseline --voice fable
  1. Convert text jailbreak prompts to audio files
python tts/prompt2audio.py --dataset textjailbreak --voice fable

Then, manually play each audio on GPT-4o to test its performance.

Data

Forbidden Questions

  • English: data/question_set/questions_tiny.csv
  • Chinese: data/question_set/questions_tiny_zh.csv

Prompts

  • Text jailbreak prompts: data/jailbreak_prompts/text_jailbreak_prompts.csv
  • VoiceJailbreak prompts: data/jailbreak_prompts/voicejailbreak.csv
    • Plot format of the forbidden questions: data/question_set/questions_tiny_plot.csv

Success Cases

data/screenshot/

Ethics

We take utmost care of the ethics of our study. Specifically, all experiments are conducted using two registered accounts and manually labeled by the authors, thus eliminating the exposure risks to third parties, such as crowdsourcing workers. Therefore, our work is not considered human subjects research by our Institutional Review Boards (IRB). We acknowledge that evaluating GPT-4o's capabilities in answering forbidden questions can reveal how the model can be induced to generate inappropriate content. This can raise concerns about potential misuse. We believe it is important to disclose this research fully. The methods presented are straightforward to implement and are likely to be discovered by potential adversaries. We have responsibly disclosed our findings to related LLM vendors.

Citation

If you find this useful in your research, please consider citing:

@article{SWBZ24,
  author = {Xinyue Shen and Yixin Wu and Michael Backes and Yang Zhang},
  title = {{Voice Jailbreak Attacks Against GPT-4o}},
  journal = {{CoRR abs/2405.19103}},
  year = {2024}
}

License

VoiceJailbreak is licensed under the terms of the MIT license. See LICENSE for more details.

About

Code for Voice Jailbreak Attacks Against GPT-4o.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages