🐢 Open-Source Evaluation & Testing for ML & LLM systems
-
Updated
Nov 18, 2024 - Python
🐢 Open-Source Evaluation & Testing for ML & LLM systems
RuLES: a benchmark for evaluating rule-following in language models
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
Code for "Adversarial attack by dropping information." (ICCV 2021)
Train AI (Keras + Tensorflow) to defend apps with Django REST Framework + Celery + Swagger + JWT - deploys to Kubernetes and OpenShift Container Platform
Performing website vulnerability scanning using OpenAI technologie
ATLAS tactics, techniques, and case studies data
pytorch implementation of Parametric Noise Injection for adversarial defense
[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the victim model's prediction for arbitrary targets.
[NDSS'24] Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time
This repository provide the studies on the security of language models for code (CodeLMs).
The Prompt Injection Testing Tool is a Python script designed to assess the security of your AI system's prompt handling against a predefined list of user prompts commonly used for injection attacks. This tool utilizes the OpenAI GPT-3.5 model to generate responses to system-user prompt pairs and outputs the results to a CSV file for analysis.
Learning to Identify Critical States for Reinforcement Learning from Videos (Accepted to ICCV'23)
Unofficial pytorch implementation of paper: Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
Python library for Modzy Machine Learning Operations (MLOps) Platform
Image Prompt Injection is a Python script that demonstrates how to embed a secret prompt within an image using steganography techniques. This hidden prompt can be later extracted by an AI system for analysis, enabling covert communication with AI models through images.
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.
Evaluation & testing framework for computer vision models
Datasets for training deep neural networks to defend software applications
Add a description, image, and links to the ai-security topic page so that developers can more easily learn about it.
To associate your repository with the ai-security topic, visit your repo's landing page and select "manage topics."