# Evaluate AI agents (Azure AI Agent Service) in Azure AI Foundry

## Objective


This sample demonstrates how to evaluate an AI agent (Azure AI Agent Service) on these important aspects of your agentic workflow:

- Intent Resolution: Measures how well the agent identifies the user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.
- Tool Call Accuracy: Evaluates the agent's ability to select the appropriate tools, and process correct parameters from previous steps.
- Task Adherence: Measures how well the agent’s response adheres to its assigned tasks, according to its system message and prior steps.

For AI agents outside of Azure AI Agent Service, you can still provide th agent data in the two formats (either simple data or agent messages) specified in the individual evaluator samples:
- [Intent resolution](https://aka.ms/intentresolution-sample)
- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample)
- [Task adherence](https://aka.ms/taskadherence-sample)
- [Response Completeness](https://aka.ms/rescompleteness-sample)



## Time 

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
Creating an agent using Azure AI agent service requires an Azure AI Foundry project and a deployed, supported model. See more details in [Create a new agent](https://learn.microsoft.com/azure/ai-services/agents/quickstart?pivots=ai-foundry-portal).

For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.    

Important: Make sure to authenticate to Azure using `az login` in your terminal before running this notebook.

### Prerequisite

Before running the sample:
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for AI-assisted evaluators, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.
6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project
7) **PROJECT_NAME** - Azure AI Project Name
8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name
9) **AGENT_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for your Azure AI agent, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.

### Setup Azure credentials and project 
1. use az cli to login to the tenant with your credential

<!-- initializing Project Client -->

In [1]:
from dotenv import load_dotenv

# load environment variables from .env file
load_dotenv(dotenv_path=".env", override=True)

from utils.fdyauth import AuthHelper
settings = AuthHelper.load_settings()
credential = AuthHelper.test_credential()

if credential:
    print('Environment and authentication OK')
else:
    print("please login first")

Environment and authentication OK


In [None]:
import os
import azure.ai.agents as agentslib
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
    AgentEvaluationRequest,
    InputDataset,
    EvaluatorIds,
    EvaluatorConfiguration,
    AgentEvaluationSamplingConfiguration,
    AgentEvaluationRedactionConfiguration,
    Evaluation,
    DatasetVersion,
    FileDatasetVersion,
)
from azure.ai.agents.models import (
    FunctionTool,
    ToolSet,
    MessageRole,
)

# Import your custom functions to be used as Tools for the Agent
from utils.user_functions import user_functions

# Initialize project client with proper authentication
project_client = AIProjectClient(
    credential=credential,  # Use the credential from earlier setup
    endpoint=settings.project_endpoint
)
print(f"project client api version: {project_client._config.api_version}")
print(f"azure-ai-agents version: {agentslib.__version__}")

AGENT_NAME = "Seattle Tourist Assistant"
AGENT_INSTRUCTIONS = """You are a helpful tourist assistant"""

# Add Tools to be used by Agent
functions = FunctionTool(user_functions)

toolset = ToolSet()
toolset.add(functions)

# To enable tool calls executed automatically
project_client.agents.enable_auto_function_calls(tools=toolset, max_retry=4)

project client api version: 2025-05-15-preview
azure-ai-agents version: 1.0.1


### Create an AI agent (Azure AI Agent Service)

In [3]:
found_agent = None
all_agents_list = project_client.agents.list_agents()
for a in all_agents_list:
    if a.name == AGENT_NAME:
        found_agent = a
        break

if found_agent:
    agent = project_client.agents.update_agent(
        agent_id=found_agent.id,
        model=settings.model_deployment_name,
        instructions=AGENT_INSTRUCTIONS,
        toolset=toolset,
    )
    project_client.agents.enable_auto_function_calls(tools=toolset, max_retry=4) 
    print(f"reusing agent > {agent.name} (id: {agent.id})")
else:
    agent = project_client.agents.create_agent(
        model=settings.model_deployment_name,
        name=AGENT_NAME,
        instructions=AGENT_INSTRUCTIONS,
        toolset=toolset,
    )
    print(f"Created agent '{AGENT_NAME}' with {len(functions._functions)} tools\nID: {agent.id}")

reusing agent > Seattle Tourist Assistant (id: asst_MSuuqYfXRGp34r33IF6DO0D0)


### Create Thread

In [4]:
thread = project_client.agents.threads.create()
print(f"Created thread, ID: {thread.id}")

Created thread, ID: thread_d4RuEKjvDHrKD0p5cejv97gC


## Conversation with Agent
Use below cells to have conversation with the agent
- `Create Message[1]`
- `Execute[2]`

### Create Message[1]

In [5]:
# Create message to thread

MESSAGE = "Can you email me weather info for Seattle ?"

message = project_client.agents.messages.create(
    thread_id=thread.id,
    role=MessageRole.USER,
    content=MESSAGE,
)
print(f"Created message, ID: {message.id}")

Created message, ID: msg_OOgFPZybtKi8coe6zjcZvL6g


### Execute[2]

In [6]:
run = project_client.agents.runs.create_and_process(thread_id=thread.id, agent_id=agent.id)

print(f"Run finished with status: {run.status}")

if run.status == "failed":
    print(f"Run failed: {run.last_error}")

print(f"Run ID: {run.id}")

Run finished with status: RunStatus.COMPLETED
Run ID: run_wgW3TkmXHGI99wDH3Riq8cyU


### List Messages

In [7]:
for message in project_client.agents.messages.list(thread.id, order="asc"):
    print(f"Role: {message.role}")
    print(f"Content: {message.content[0].text.value}")
    print("-" * 40)

Role: MessageRole.USER
Content: Can you email me weather info for Seattle ?
----------------------------------------
Role: MessageRole.AGENT
Content: To email you the weather information for Seattle, I need your email address. Could you please provide it?
----------------------------------------


# Evaluate

### Get data from agent

In [21]:
from azure.ai.evaluation import AIAgentConverter
import json 

# Initialize the converter that will be backed by the project.
converter = AIAgentConverter(project_client)

thread_id = thread.id
run_id = run.id


# Get a single agent run data
evaluation_data_single_run = converter.convert(thread_id=thread_id, run_id=run_id)

# Run this to save thread data to a JSONL file for evaluation
# Save the agent thread data to a JSONL file
evaluation_data = converter.prepare_evaluation_data(thread_ids=thread_id, filename="data\evaluation_data_1.jsonl")
print(json.dumps(evaluation_data, indent=4))

  evaluation_data = converter.prepare_evaluation_data(thread_ids=thread_id, filename="data\evaluation_data_1.jsonl")


[
    {
        "query": [
            {
                "role": "system",
                "content": "You are a helpful tourist assistant"
            },
            {
                "createdAt": "2025-06-24T21:21:36Z",
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Can you email me weather info for Seattle ?"
                    }
                ]
            }
        ],
        "response": [
            {
                "createdAt": "2025-06-24T21:21:39Z",
                "run_id": "run_wgW3TkmXHGI99wDH3Riq8cyU",
                "role": "assistant",
                "content": [
                    {
                        "type": "text",
                        "text": "To email you the weather information for Seattle, I need your email address. Could you please provide it?"
                    }
                ]
            }
        ],
        "tool_definitions": [
  

In [9]:
evaluation_data_single_run

{'query': [{'role': 'system',
   'content': 'You are a helpful tourist assistant'},
  {'createdAt': '2025-06-24T21:21:36Z',
   'role': 'user',
   'content': [{'type': 'text',
     'text': 'Can you email me weather info for Seattle ?'}]}],
 'response': [{'createdAt': '2025-06-24T21:21:39Z',
   'run_id': 'run_wgW3TkmXHGI99wDH3Riq8cyU',
   'role': 'assistant',
   'content': [{'type': 'text',
     'text': 'To email you the weather information for Seattle, I need your email address. Could you please provide it?'}]}],
 'tool_definitions': [{'name': 'convert_temperature',
   'type': 'function',
   'description': 'Converts temperature from Celsius to Fahrenheit.',
   'parameters': {'type': 'object',
    'properties': {'celsius': {'type': 'number',
      'description': 'Temperature in Celsius.'}}}},
  {'name': 'send_email',
   'type': 'function',
   'description': 'Sends an email with the specified subject and body to the recipient.',
   'parameters': {'type': 'object',
    'properties': {'reci

### Setting up evaluator

We will select the following evaluators to assess the different aspects relevant for agent quality: 

- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. Scale: integer 1-5. Higher is better.
- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. Scale: float 0-1. Higher is better.
- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. Scale: integer 1-5. Higher is better.


In [10]:
from azure.ai.evaluation import (
    ToolCallAccuracyEvaluator,
    AzureOpenAIModelConfiguration,
    IntentResolutionEvaluator,
    TaskAdherenceEvaluator,
)
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=settings.azure_openai_endpoint,
    api_key=settings.azure_openai_api_key,
    api_version=settings.azure_openai_api_version,
    azure_deployment=settings.model_deployment_name,
)

intent_resolution = IntentResolutionEvaluator(model_config=model_config)

tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)

task_adherence = TaskAdherenceEvaluator(model_config=model_config)

Class IntentResolutionEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Class ToolCallAccuracyEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class TaskAdherenceEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


In [11]:
# azure_ai_project = {
#     "subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
#     "project_name": os.environ["PROJECT_NAME"],
#     "resource_group_name": os.environ["RESOURCE_GROUP_NAME"],
# }

agent_client = project_client.agents
agent_client._config.endpoint

project_client._config

<azure.ai.projects._configuration.AIProjectClientConfiguration at 0x1baf9cc1e50>

### Run Evaluator

In [12]:
# par_dir = os.path.dirname(os.getcwd())
sub_dir = os.getcwd()
# print(sub_dir)
data_file_path = os.path.join(sub_dir,"data","evaluation_data.jsonl")
if os.path.exists(data_file_path):
   data_file = data_file_path
else:
   data_file = ""

print(data_file)

c:\Users\yingdingwang\Documents\VCS\democollections\AgentEvalExamle\fdy\data\evaluation_data.jsonl


In [24]:
print("Upload a single file and create a new Dataset to reference the file.")
dataset_name = "tourist-test-dataset"
dataset_version = "1.0"
# dataset: DatasetVersion = project_client.datasets.upload_file(
#     name=dataset_name,
#     version=dataset_version,
#     file_path=data_file,
# )

# stacctaievalywuno "storage account contributor" 
dataset: DatasetVersion = project_client.datasets.upload_file(
    name=dataset_name,
    version=dataset_version,
    file_path=data_file,
    connection_name=""
)
print(dataset)

Upload a single file and create a new Dataset to reference the file.


HttpResponseError: (UserError) Service invocation failed!
Request: POST eastus2.api.azureml.ms/assetstore/v1.0/temporaryDataReference/createOrGet
Status Code: 403 Forbidden
Error Code: UserError/Auth/Authorization/ResourceMsiTokenDoesntHavePermissionsOnStorage
Reason Phrase: Foundry project MSI foundry-proj-yw-uno-resource/foundry-proj-yw-uno doesn't have appropriate permis
Response Body: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Foundry project MSI foundry-proj-yw-uno-resource/foundry-proj-yw-uno doesn't have appropriate permissions on the storage account stacctaievalywuno",
    "messageFormat": "Foundry project MSI {foundryAccountName}/{projectName} doesn't have appropriate permissions on the storage account {storageAccount}",
    "messageParameters": {
      "foundryAccountName": "foundry-proj-yw-uno-resource",
      "projectName": "foundry-proj-yw-uno",
      "storageAccount": "stacctaievalywuno"
    },
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": {
      "code": "Auth",
      "innerError": {
        "code": "Authorization",
        "innerError": {
          "code": "ResourceMsiTokenDoesntHavePermissionsOnStorage",
          "innerError": null
        }
      }
    },
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "3f653d38e38bbef897565c5f7062a321",
    "request": "fd80f39245439ad8"
  },
  "environment": "eastus2",
  "location": "eastus2",
  "time": "2025-06-24T22:35:14.705557+00:00",
  "componentName": "assetstore",
  "statusCode": 403
}
Code: UserError
Message: Service invocation failed!
Request: POST eastus2.api.azureml.ms/assetstore/v1.0/temporaryDataReference/createOrGet
Status Code: 403 Forbidden
Error Code: UserError/Auth/Authorization/ResourceMsiTokenDoesntHavePermissionsOnStorage
Reason Phrase: Foundry project MSI foundry-proj-yw-uno-resource/foundry-proj-yw-uno doesn't have appropriate permis
Response Body: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Foundry project MSI foundry-proj-yw-uno-resource/foundry-proj-yw-uno doesn't have appropriate permissions on the storage account stacctaievalywuno",
    "messageFormat": "Foundry project MSI {foundryAccountName}/{projectName} doesn't have appropriate permissions on the storage account {storageAccount}",
    "messageParameters": {
      "foundryAccountName": "foundry-proj-yw-uno-resource",
      "projectName": "foundry-proj-yw-uno",
      "storageAccount": "stacctaievalywuno"
    },
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": {
      "code": "Auth",
      "innerError": {
        "code": "Authorization",
        "innerError": {
          "code": "ResourceMsiTokenDoesntHavePermissionsOnStorage",
          "innerError": null
        }
      }
    },
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "3f653d38e38bbef897565c5f7062a321",
    "request": "fd80f39245439ad8"
  },
  "environment": "eastus2",
  "location": "eastus2",
  "time": "2025-06-24T22:35:14.705557+00:00",
  "componentName": "assetstore",
  "statusCode": 403
}

### Agent Evaluation

In [17]:
agent_evaluation_request = AgentEvaluationRequest(
    run_id=run_id,
    thread_id=thread_id,
    evaluators={
        "violence": EvaluatorConfiguration(id=EvaluatorIds.VIOLENCE),
    },
    sampling_configuration=AgentEvaluationSamplingConfiguration(
        name="test",
        sampling_percent=100,
        max_request_rate=100,
    ),
    redaction_configuration=AgentEvaluationRedactionConfiguration(
        redact_score_properties=False,
    ),
    app_insights_connection_string=project_client.telemetry.get_connection_string(),
)

agent_evaluation_response = project_client.evaluations.create_agent_evaluation(
    evaluation=agent_evaluation_request,
)

In [19]:
print(agent_evaluation_response)

{'id': 'thread_d4RuEKjvDHrKD0p5cejv97gC;run_wgW3TkmXHGI99wDH3Riq8cyU', 'status': 'Running', 'result': None, 'error': None}


### Run PromptFlow Evaluator

In [None]:
# from azure.ai.evaluation import evaluate

# response = evaluate(
#     data=file_name,
#     evaluators={
#         "tool_call_accuracy": tool_call_accuracy,
#         "intent_resolution": intent_resolution,
#         "task_adherence": task_adherence,
#     },
    
#     azure_ai_project={
#         "subscription_id": settings.azure_subscription_id,
#         "project_name": settings.project_name,
#         "resource_group_name": settings.resource_group_name,
#     },
# )
# pprint(f'AI Foundary URL: {response.get("studio_url")}')

## Inspect results on Azure AI Foundry

Go to AI Foundry URL for rich Azure AI Foundry data visualization to inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve.

In [None]:
# alternatively, you can use the following to get the evaluation results in memory

# average scores across all runs
pprint(response["metrics"])