Skip to content

yjw1029/Self-Reminder-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Self-Reminder-Data

The data used in Defending ChatGPT against Jailbreak Attack via Self-Reminder.

Contents

Overview

ChatGPT has demonstrated itself as a powerful AI tool and has garnered hundreds of millions of users. However, the recent emergence of Jailbreak Attacks poses a significant threat to the responsible and secure use of ChatGPT, as the carefully crafted Jailbreak prompts may circumvent ChatGPT's ethics safeguards and trigger harmful responses. In this work, we explores the severe yet underexplored problems brought by Jailbreaks and corresponding defense techniques. We introduce a Jailbreak dataset with various types of Jailbreak prompts and malicious instructions. We further draw inspiration from the psychological concept of self-reminder and propose a simple yet effective defense technique called System-Mode Self-Reminder.

Repo Contents

Prompt Analysis

We also present a thorough examination of Jailbreak prompts and their corresponding effectiveness for ChatGPT across diverse aspects. Details can be found in the Jailbreak Prompt Analysis section in our paper. Prompt Analysis

About

Data for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder"

Resources

License

Stars

Watchers

Forks

Packages

No packages published