# AI Red Teaming Agent in AI Foundry Project V2

## Objective
This notebook walks through how to use Azure AI Evaluation's AI Red Teaming Agent functionality to assess the safety and resilience of AI systems against adversarial prompt attacks. AI Red Teaming Agent leverages [Risk and Safety Evaluations](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-evaluators) to help identify potential safety issues across different risk categories (violence, hate/unfairness, sexual content, self-harm) combined with attack strategies of varying complexity levels from [PyRIT](https://github.com/Azure/PyRIT), Microsoft AI Red Teaming team's open framework for automated AI red teaming.

## Before you begin

<!--
### Prerequisite
First, if you have an Azure subscription, [create an Azure AI project](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources). AI projects and Hubs can be served within a private network and are compatible with private endpoints. You **do not** need to provide your own LLM deployment as the AI Red Teaming Agent hosts adversarial models for both simulation and evaluation of harmful content and connects to it via your Azure AI project.
-->

**Note**: In order to upload your results to Azure AI Foundry, you must have the `Storage Blob Data Contributor` role


**Important**: First, ensure that you've installed the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) and then make sure to authenticate to Azure using `az login` in your terminal before running this notebook.

### Load Credentials

In [1]:
from dotenv import load_dotenv

# load environment variables from .env file
load_dotenv(dotenv_path=".env_eastus2", override=True)
# load_dotenv(dotenv_path=".env_uaenorth", override=True)

from utils.fdyauth import AuthHelper
settings = AuthHelper.load_settings()
credential = AuthHelper.test_credential()

if credential:
    print('Environment and authentication OK')
else:
    print("please login first")

Environment and authentication OK


In [2]:
from typing import Optional, Dict, Any
import os

from azure.ai.projects.models import (
    RedTeam,
    AzureOpenAIModelConfiguration,
    AttackStrategy,
    RiskCategory,
)

In [3]:

# Azure OpenAI deployment information
azure_openai_deployment = settings.model_deployment_name  # e.g., "gpt-4.1-mini"
azure_openai_endpoint = settings.azure_openai_endpoint # e.g., "https://<account_name>.cognitiveservices.azure.com/"
azure_openai_api_key = settings.azure_openai_api_key  # e.g., "your-api-key"
azure_openai_api_version = settings.azure_openai_api_version  # Use the latest API version

## Understanding AI Red Teaming Agent's capabilities

The Azure AI Evaluation SDK's `RedTeam` functionality evaluates AI systems against adversarial prompts across multiple dimensions:

1. **Risk Categories**: Different content risk categories your AI system might generate
   - Violence
   - HateUnfairness
   - Sexual
   - SelfHarm

2. **Attack Strategies**: Along with standard unmodified prompts which are sent by default as the `baseline`, you can specify different transformations of prompts to elicit undesired content.
You can also use `AttackStrategy.Compose()` to layer two strategies in one attack
   - AnsiAttack: Using ANSI escape codes in prompts
   - AsciiArt: Using ASCII art to disguise harmful content
   - AsciiSmuggler: Hiding harmful content within ASCII characters
   - Atbash: Using the Atbash cipher to encode harmful requests
   - Base64: Encoding harmful content in Base64 format
   - Binary: Converting text to binary to bypass filters
   - Caesar: Using the Caesar cipher for encoding
   - CharacterSpace: Manipulating character spacing to confuse filters
   - CharSwap: Swapping characters to bypass detection
   - Diacritic: Using diacritical marks to alter text appearance
   - Flip: Flipping text to bypass content filters
   - Leetspeak: Converting letters to numbers and symbols
   - Morse: Using Morse code to encode harmful requests
   - ROT13: Using ROT13 cipher for text transformation
   - SuffixAppend: Adding suffixes to confuse detection systems
   - StringJoin: Joining strings in unconventional ways
   - Tense: Changing the tense of harmful requests to past tense
   - UnicodeConfusable: Using similar-looking Unicode characters
   - UnicodeSubstitution: Substituting characters with Unicode alternatives
   - Url: Embedding harmful content within URLs
   - Jailbreak: Specially crafted prompts to bypass AI safeguards

3. **Complexity Levels**: Different difficultly levels of attacks
   - Baseline: Standard functionality tests
   - Easy: Simple attack patterns
   - Moderate: More sophisticated attacks
   - Difficult: Complex, layered attack strategies

The key metric for evaluating results is the **Attack Success Rate (ASR)**, which measures the percentage of attacks that successfully elicit harmful content from your AI system.

In [4]:
# Create and run the Red Team scan
from azure.ai.projects import AIProjectClient
import azure.ai.agents as agentslib
import azure.ai.projects as projectslib

# Initialize the AIProjectClient with the credential
project_client = AIProjectClient(
    endpoint=settings.azure_openai_endpoint,
    credential=credential,
    # api_version="latest",
)

print(f"project client api version: {project_client._config.api_version}")
print(f"azure-ai-agents version: {agentslib.__version__}")
print(f"azure-ai-projects version: {projectslib.__version__}")

project client api version: 2025-05-15-preview
azure-ai-agents version: 1.1.0b4
azure-ai-projects version: 1.0.0b12


In [5]:
print("Creating a Red Team scan for direct model testing")

# create target configuration for testing an Azure OpenAI model
target_configuration = AzureOpenAIModelConfiguration(
    model_deployment_name=azure_openai_deployment,
)

# Create a Red Team confguration
red_team = RedTeam(
    attack_strategies=[AttackStrategy.BASE64],
    risk_categories=[RiskCategory.VIOLENCE],
    display_name="redteamtest1",
    target=target_configuration,
)

Creating a Red Team scan for direct model testing


In [6]:
red_team_response: RedTeam = project_client.red_teams.create(
    red_team,
    headers={
        "model-endpoint": azure_openai_endpoint,
        # "api-key": azure_openai_api_key,
        "model-api-key": azure_openai_api_key,
    },
)


print(red_team_response)


ResourceNotFoundError: (404) Resource not found
Code: 404
Message: Resource not found

In [None]:
print("Getting Red Team scan details")
# Use the name returned by the create operation for the get call
get_red_team_response = project_client.red_teams.get(name=red_team_response.name)
print(f"Red Team scan status: {get_red_team_response.status}")

print("Listing all Red Team scans")
for scan in project_client.red_teams.list():
    print(f"Found scan: {scan.name}, Status: {scan.status}")

: 