# Trace Your AI Agents Using Azure AI Foundry SDK (Preview)

This notebook demonstrates how to instrument tracing in agents using Azure AI Foundry SDK with OpenTelemetry and Azure Monitor for enhanced observability and debugging.

## Key Concepts

- **Traces**: Capture the journey of a request or workflow through your application
- **Spans**: Building blocks of traces, representing single operations within a trace
- **Attributes**: Key-value pairs providing contextual metadata
- **Semantic conventions**: Standardized names and formats for trace data
- **Trace exporters**: Send trace data to backend systems for storage and analysis

## Environment Setup

Configure environment variables and import necessary libraries.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Enable content recording (optional - may contain personal data)
os.environ["AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED"] = "true"  # False by default

# Set service name for identification in Application Insights
os.environ["OTEL_SERVICE_NAME"] = "azure-ai-agents-tracing-demo"

# Verify required environment variables
required_vars = ["PROJECT_ENDPOINT", "AZURE_OPENAI_DEPLOYMENT_NAME"]
for var in required_vars:
    if var not in os.environ:
        print(f"  Warning: {var} environment variable ❌ not set")
    else:
        print(f" {var} is configured")

## Import Required Libraries

Import all necessary libraries for Azure AI Foundry, OpenTelemetry, and tracing.

In [None]:
# Azure AI imports
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.telemetry import AIAgentsInstrumentor

# OpenTelemetry imports
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace import SpanKind

# Azure Monitor imports
from azure.monitor.opentelemetry import configure_azure_monitor

# Core tracing settings
from azure.core.settings import settings
settings.tracing_implementation = "opentelemetry"

print(" ✅ All libraries imported successfully")

## Option 1: Local Console Tracing Setup

Configure tracing to output to console for local development and debugging.

In [None]:
def setup_console_tracing():
    """Setup tracing to console for local development"""
    # Setup tracing to console
    span_exporter = ConsoleSpanExporter()
    tracer_provider = TracerProvider()
    tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter))
    trace.set_tracer_provider(tracer_provider)
    
    print(" Console tracing configured")
    return trace.get_tracer(__name__)

# Uncomment the line below to enable console tracing
# console_tracer = setup_console_tracing()

## Option 2: Azure Monitor Tracing Setup

Configure tracing to send data to Azure Monitor Application Insights.

In [None]:
def setup_azure_monitor_tracing(project_client):
    """Setup tracing to Azure Monitor Application Insights"""
    try:
        # Get connection string from the Application Insights resource
        connection_string = project_client.telemetry.get_application_insights_connection_string()
        
        # Configure Azure Monitor
        configure_azure_monitor(connection_string=connection_string)
        
        print(" Azure Monitor tracing configured")
        print(f" Connection string: {connection_string[:50]}...")
        
        return trace.get_tracer(__name__)
    except Exception as e:
        print(f" ❌ Failed to setup Azure Monitor tracing: {e}")
        print(" Make sure Application Insights is connected to your AI Foundry project")
        return None

## Initialize Azure AI Project Client

Create the AI Project Client to connect to your Azure AI Foundry project.

In [None]:
# Initialize Azure AI Project Client
try:
    project_client = AIProjectClient(
        credential=DefaultAzureCredential(),
        endpoint=os.environ["PROJECT_ENDPOINT"],
    )
    print(" Azure AI Project Client initialized ✅ successfully")
    print(f" Project endpoint: {os.environ['PROJECT_ENDPOINT']}")
except Exception as e:
    print(f" ❌ Failed to initialize project client: {e}")
    raise

## Setup Tracing Configuration

Choose between console tracing (local) or Azure Monitor tracing (cloud).

In [None]:
# Choose tracing method
USE_AZURE_MONITOR = True  # Set to False for console tracing

if USE_AZURE_MONITOR:
    tracer = setup_azure_monitor_tracing(project_client)
    if tracer is None:
        print(" Falling back to console tracing")
        tracer = setup_console_tracing()
else:
    tracer = setup_console_tracing()

# Enable AI Agents instrumentation
AIAgentsInstrumentor().instrument()
print(" AI Agents instrumentation enabled")

## Basic Agent with Tracing Example

Create and run an AI agent with tracing enabled.

In [None]:
def create_and_run_agent_with_tracing():
    """Create and run an AI agent with comprehensive tracing"""
    
    with tracer.start_as_current_span("agent-creation-and-execution") as main_span:
        # Add attributes to the main span
        main_span.set_attribute("operation.type", "agent_execution")
        main_span.set_attribute("model.deployment", os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"])
        
        try:
            # Create agent
            with tracer.start_as_current_span("create-agent") as agent_span:
                agent_span.set_attribute("agent.name", "helpful-assistant")
                agent_span.set_attribute("agent.instructions", "You are a helpful assistant")
                
                agent = project_client.agents.create_agent(
                    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
                    name="helpful-assistant",
                    instructions="You are a helpful assistant that provides clear and concise answers."
                )
                
                agent_span.set_attribute("agent.id", agent.id)
                print(f" Agent created with ID: {agent.id}")
            
            # Create thread
            with tracer.start_as_current_span("create-thread") as thread_span:
                thread = project_client.agents.threads.create()
                thread_span.set_attribute("thread.id", thread.id)
                print(f" Thread created with ID: {thread.id}")
            
            # Create message
            with tracer.start_as_current_span("create-message") as message_span:
                user_message = "Tell me a joke about programming"
                message_span.set_attribute("message.role", "user")
                message_span.set_attribute("message.content", user_message)
                
                message = project_client.agents.messages.create(
                    thread_id=thread.id,
                    role="user",
                    content=user_message
                )
                
                message_span.set_attribute("message.id", message.id)
                print(f" Message created: {user_message}")
            
            # Run agent
            with tracer.start_as_current_span("run-agent") as run_span:
                run_span.set_attribute("run.agent_id", agent.id)
                run_span.set_attribute("run.thread_id", thread.id)
                
                run = project_client.agents.runs.create_and_process(
                    thread_id=thread.id,
                    agent_id=agent.id
                )
                
                run_span.set_attribute("run.id", run.id)
                run_span.set_attribute("run.status", run.status)
                print(f" Run completed with status: {run.status}")
            
            # Get response
            with tracer.start_as_current_span("get-response") as response_span:
                messages = project_client.agents.messages.list(thread_id=thread.id)
                
                # Find the assistant's response
                assistant_messages = [msg for msg in messages if msg.role == "assistant"]
                if assistant_messages:
                    latest_response = assistant_messages[0]
                    response_content = latest_response.content[0].text.value
                    
                    response_span.set_attribute("response.content", response_content)
                    response_span.set_attribute("response.message_id", latest_response.id)
                    
                    print(f"\n Assistant Response:")
                    print(f"{response_content}")
                    
                    return {
                        "agent_id": agent.id,
                        "thread_id": thread.id,
                        "run_id": run.id,
                        "response": response_content
                    }
                else:
                    print(" No assistant response found")
                    return None
                    
        except Exception as e:
            main_span.set_attribute("error.message", str(e))
            main_span.set_attribute("error.type", type(e).__name__)
            print(f" ❌ Error in agent execution: {e}")
            raise

# Run the agent with tracing
result = create_and_run_agent_with_tracing()

## Custom Function Tracing

Demonstrate how to trace custom functions with detailed attributes.

In [None]:
def process_user_input(user_input: str) -> dict:
    """Custom function with tracing to process user input"""
    
    with tracer.start_as_current_span("process-user-input") as span:
        span.set_attribute("input.length", len(user_input))
        span.set_attribute("input.type", "text")
        
        # Simulate processing
        import time
        time.sleep(0.1)  # Simulate processing time
        
        # Extract some metadata
        word_count = len(user_input.split())
        has_question = "?" in user_input
        has_exclamation = "!" in user_input
        
        # Add processing attributes
        span.set_attribute("processing.word_count", word_count)
        span.set_attribute("processing.has_question", has_question)
        span.set_attribute("processing.has_exclamation", has_exclamation)
        
        result = {
            "original_input": user_input,
            "word_count": word_count,
            "sentiment": "curious" if has_question else "excited" if has_exclamation else "neutral",
            "processed_at": time.time()
        }
        
        span.set_attribute("output.sentiment", result["sentiment"])
        
        return result

def analyze_conversation_flow(messages: list) -> dict:
    """Analyze conversation flow with nested tracing"""
    
    with tracer.start_as_current_span("analyze-conversation-flow") as span:
        span.set_attribute("conversation.message_count", len(messages))
        
        analysis = {
            "total_messages": len(messages),
            "user_messages": 0,
            "assistant_messages": 0,
            "average_length": 0
        }
        
        total_length = 0
        
        for i, message in enumerate(messages):
            with tracer.start_as_current_span(f"analyze-message-{i}") as msg_span:
                msg_span.set_attribute("message.role", message.get("role", "unknown"))
                msg_span.set_attribute("message.length", len(message.get("content", "")))
                
                if message.get("role") == "user":
                    analysis["user_messages"] += 1
                elif message.get("role") == "assistant":
                    analysis["assistant_messages"] += 1
                
                total_length += len(message.get("content", ""))
        
        if len(messages) > 0:
            analysis["average_length"] = total_length / len(messages)
        
        # Add analysis results to span
        for key, value in analysis.items():
            span.set_attribute(f"analysis.{key}", value)
        
        return analysis

# Test custom function tracing
print(" Testing custom function tracing...")

# Process user input
user_input = "How can I improve my Python coding skills?"
processed_input = process_user_input(user_input)
print(f"Processed input: {processed_input}")

# Analyze conversation (mock data)
mock_conversation = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing well, thank you! How can I help you today?"},
    {"role": "user", "content": "Can you explain machine learning?"},
    {"role": "assistant", "content": "Machine learning is a subset of artificial intelligence..."}
]

conversation_analysis = analyze_conversation_flow(mock_conversation)
print(f"Conversation analysis: {conversation_analysis}")

## Simulating User Feedback Tracing

Demonstrate how to attach user feedback to traces for better observability.

In [None]:
def simulate_user_feedback(response_id: str, feedback_data: dict):
    """Simulate user feedback collection and tracing"""
    
    with tracer.start_as_current_span("user-feedback") as feedback_span:
        # Set feedback attributes according to OpenTelemetry semantic conventions
        feedback_span.set_attribute("gen_ai.user.feedback.rating", feedback_data.get("rating", 0))
        feedback_span.set_attribute("gen_ai.user.feedback.comment", feedback_data.get("comment", ""))
        feedback_span.set_attribute("gen_ai.response.id", response_id)
        feedback_span.set_attribute("feedback.timestamp", feedback_data.get("timestamp", ""))
        feedback_span.set_attribute("feedback.user_id", feedback_data.get("user_id", "anonymous"))
        
        # Add custom attributes
        if "helpful" in feedback_data:
            feedback_span.set_attribute("feedback.helpful", feedback_data["helpful"])
        
        if "accuracy" in feedback_data:
            feedback_span.set_attribute("feedback.accuracy", feedback_data["accuracy"])
        
        print(f" User feedback recorded for response: {response_id}")
        print(f"   Rating: {feedback_data.get('rating')}/5")
        print(f"   Comment: {feedback_data.get('comment')}")
        
        return feedback_span.get_span_context().span_id

# Simulate user feedback for the previous agent response
if result:
    import time
    
    feedback_examples = [
        {
            "rating": 5,
            "comment": "Great joke! Very funny and programming-related.",
            "helpful": True,
            "accuracy": 5,
            "timestamp": str(int(time.time())),
            "user_id": "user_123"
        },
        {
            "rating": 3,
            "comment": "The joke was okay, but could be funnier.",
            "helpful": True,
            "accuracy": 4,
            "timestamp": str(int(time.time())),
            "user_id": "user_456"
        }
    ]
    
    for i, feedback in enumerate(feedback_examples):
        print(f"\n Simulating feedback {i+1}:")
        feedback_id = simulate_user_feedback(result["run_id"], feedback)
else:
    print("  No agent result available for feedback simulation")

## Advanced Tracing: Multi-Agent Workflow

Demonstrate tracing in a more complex scenario with multiple agents.

In [None]:
def multi_agent_workflow():
    """Demonstrate tracing with multiple agents in a workflow"""
    
    with tracer.start_as_current_span("multi-agent-workflow") as workflow_span:
        workflow_span.set_attribute("workflow.type", "multi_agent")
        workflow_span.set_attribute("workflow.agents_count", 2)
        
        try:
            # Create first agent (Research Assistant)
            with tracer.start_as_current_span("create-research-agent") as research_span:
                research_agent = project_client.agents.create_agent(
                    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
                    name="research-assistant",
                    instructions="You are a research assistant that provides factual information and sources."
                )
                research_span.set_attribute("agent.type", "research")
                research_span.set_attribute("agent.id", research_agent.id)
                print(f" Research agent created: {research_agent.id}")
            
            # Create second agent (Writing Assistant)
            with tracer.start_as_current_span("create-writing-agent") as writing_span:
                writing_agent = project_client.agents.create_agent(
                    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
                    name="writing-assistant",
                    instructions="You are a writing assistant that creates well-structured content based on research."
                )
                writing_span.set_attribute("agent.type", "writing")
                writing_span.set_attribute("agent.id", writing_agent.id)
                print(f" Writing agent created: {writing_agent.id}")
            
            # Research phase
            with tracer.start_as_current_span("research-phase") as research_phase_span:
                research_thread = project_client.agents.threads.create()
                research_query = "What are the key benefits of using OpenTelemetry for observability?"
                
                research_phase_span.set_attribute("phase.name", "research")
                research_phase_span.set_attribute("query", research_query)
                
                # Send query to research agent
                project_client.agents.messages.create(
                    thread_id=research_thread.id,
                    role="user",
                    content=research_query
                )
                
                research_run = project_client.agents.runs.create_and_process(
                    thread_id=research_thread.id,
                    agent_id=research_agent.id
                )
                
                research_phase_span.set_attribute("run.status", research_run.status)
                print(f" Research phase completed: {research_run.status}")
                
                # Get research results
                research_messages = project_client.agents.messages.list(thread_id=research_thread.id)
                research_result = None
                for msg in research_messages:
                    if msg.role == "assistant":
                        research_result = msg.content[0].text.value
                        break
            
            # Writing phase - only proceed if we have research results
            final_result = None
            if research_result:
                with tracer.start_as_current_span("writing-phase") as writing_phase_span:
                    writing_thread = project_client.agents.threads.create()
                    # Truncate research result to avoid overwhelming the writing agent
                    truncated_research = research_result[:1000] + "..." if len(research_result) > 1000 else research_result
                    writing_prompt = f"Based on this research: {truncated_research}\n\nPlease write a concise summary in bullet points."
                    
                    writing_phase_span.set_attribute("phase.name", "writing")
                    writing_phase_span.set_attribute("input.source", "research_agent")
                    writing_phase_span.set_attribute("input.length", len(truncated_research))
                    
                    # Send research to writing agent
                    project_client.agents.messages.create(
                        thread_id=writing_thread.id,
                        role="user",
                        content=writing_prompt
                    )
                    
                    writing_run = project_client.agents.runs.create_and_process(
                        thread_id=writing_thread.id,
                        agent_id=writing_agent.id
                    )
                    
                    writing_phase_span.set_attribute("run.status", writing_run.status)
                    print(f" Writing phase completed: {writing_run.status}")
                    
                    # Get final result
                    writing_messages = project_client.agents.messages.list(thread_id=writing_thread.id)
                    for msg in writing_messages:
                        if msg.role == "assistant":
                            final_result = msg.content[0].text.value
                            break
            else:
                print(" Skipping writing phase - no research result available")
            
            workflow_span.set_attribute("workflow.status", "completed")
            
            print("\n Multi-Agent Workflow Results:")
            print("=" * 50)
            print(f"Research Query: {research_query}")
            
            # Add null checks before slicing
            if research_result:
                print(f"\n Research Result:\n{research_result[:200]}...")
            else:
                print("\n Research Result: ❌ No result available (agent may have ❌ failed)")
                
            if final_result:
                print(f"\n  Final Summary:\n{final_result}")
            else:
                print("\n  Final Summary: ❌ No result available (writing agent skipped or ❌ failed)")
            
            return {
                "research_agent_id": research_agent.id,
                "writing_agent_id": writing_agent.id,
                "research_result": research_result,
                "final_result": final_result
            }
            
        except Exception as e:
            workflow_span.set_attribute("workflow.status", "failed")
            workflow_span.set_attribute("error.message", str(e))
            print(f" Multi-agent workflow ❌ failed: {e}")
            raise

# Run multi-agent workflow
print(" Starting multi-agent workflow...")
workflow_result = multi_agent_workflow()

## Tracing Best Practices and Tips

Demonstrate best practices for effective tracing.

In [None]:
def demonstrate_tracing_best_practices():
    """Demonstrate tracing best practices"""
    
    print(" Tracing Best Practices Demo")
    print("=" * 40)
    
    # 1. Use meaningful span names
    with tracer.start_as_current_span("user-authentication-flow") as auth_span:
        auth_span.set_attribute("user.action", "login_attempt")
        auth_span.set_attribute("auth.method", "azure_ad")
        
        # 2. Add contextual attributes
        import time
        start_time = time.time()
        
        # Simulate authentication
        time.sleep(0.1)
        
        end_time = time.time()
        auth_span.set_attribute("auth.duration_ms", (end_time - start_time) * 1000)
        auth_span.set_attribute("auth.success", True)
        
        print(" Authentication traced with detailed attributes")
    
    # 3. Use nested spans for complex operations
    with tracer.start_as_current_span("data-processing-pipeline") as pipeline_span:
        pipeline_span.set_attribute("pipeline.type", "text_analysis")
        
        # Step 1: Data validation
        with tracer.start_as_current_span("validate-input") as validation_span:
            validation_span.set_attribute("validation.rules_count", 5)
            validation_span.set_attribute("validation.passed", True)
            print("   Input validation traced")
        
        # Step 2: Data transformation
        with tracer.start_as_current_span("transform-data") as transform_span:
            transform_span.set_attribute("transform.type", "text_normalization")
            transform_span.set_attribute("transform.input_size", 1024)
            transform_span.set_attribute("transform.output_size", 987)
            print("   Data transformation traced")
        
        # Step 3: Analysis
        with tracer.start_as_current_span("analyze-content") as analysis_span:
            analysis_span.set_attribute("analysis.model", "sentiment_analyzer_v2")
            analysis_span.set_attribute("analysis.confidence", 0.95)
            analysis_span.set_attribute("analysis.result", "positive")
            print("   Content analysis traced")
        
        pipeline_span.set_attribute("pipeline.steps_completed", 3)
        pipeline_span.set_attribute("pipeline.success", True)
        
        print(" ✅ Complete pipeline traced with nested spans")
    
    # 4. ❌ Error handling with tracing
    with tracer.start_as_current_span("error-handling-demo") as error_span:
        try:
            # Simulate an operation that might fail
            error_span.set_attribute("operation.type", "risky_operation")
            
            # This will not actually fail for demo purposes
            success = True
            
            if success:
                error_span.set_attribute("operation.result", "success")
                print(" ❌ Error handling pattern demonstrated (success case)")
            else:
                raise ValueError("Simulated error")
                
        except Exception as e:
            error_span.set_attribute("error.type", type(e).__name__)
            error_span.set_attribute("error.message", str(e))
            error_span.set_attribute("operation.result", "failure")
            print(f" ❌ Error traced: {e}")
    
    print("\n Key Takeaways:")
    print("   • Use descriptive span names")
    print("   • Add relevant attributes for context")
    print("   • Use nested spans for complex workflows")
    print("   • Always trace errors with details")
    print("   • Include timing and performance metrics")
    print("   • Use semantic conventions when possible")

demonstrate_tracing_best_practices()

## Viewing Traces

Information about where and how to view your traces.

In [None]:
def display_tracing_info():
    """Display information about viewing traces"""
    
    print(" Viewing Your Traces")
    print("=" * 30)
    
    if USE_AZURE_MONITOR:
        print(" Azure AI Foundry Portal:")
        print(f"   • Navigate to your project: {os.environ.get('PROJECT_ENDPOINT', 'YOUR_PROJECT_ENDPOINT')}")
        print("   • Go to 'Tracing' in the left navigation")
        print("   • Filter and view your traces")
        print("   • Click on traces to see detailed spans")
        
        print("\n Azure Monitor Application Insights:")
        print("   • Access via 'Manage data source' in AI Foundry")
        print("   • Use End-to-end transaction details")
        print("   • Query with KQL for advanced analysis")
        print("   • Example query: | where cloud_RoleName == 'azure-ai-agents-tracing-demo'")
        
    else:
        print("  Console Output:")
        print("   • Traces are displayed in the console above")
        print("   • Each span shows start/end times and attributes")
        
    print("\n What to Look For:")
    print("   • Span durations and performance bottlenecks")
    print("   • ❌ Error rates and failure patterns")
    print("   • Agent interactions and conversation flows")
    print("   • Custom attributes and business metrics")
    print("   • User feedback correlation")
    
    print("\n Monitoring Tips:")
    print("   • Set up alerts for high ❌ error rates")
    print("   • Monitor response times and latency")
    print("   • Track user satisfaction through feedback")
    print("   • Use service name to filter multi-app environments")

display_tracing_info()

## Cleanup and Summary

Clean up resources and provide a summary of what was covered.

In [None]:

def display_summary():
    """Display a summary of what was covered in this notebook"""
    
    print("\n Notebook Summary")
    print("=" * 20)
    
    print(" What we covered:")
    print("   • Azure AI Foundry SDK setup and configuration")
    print("   • OpenTelemetry tracing configuration")
    print("   • Console vs Azure Monitor tracing")
    print("   • Basic agent creation and execution with tracing")
    print("   • Custom function tracing with attributes")
    print("   • User feedback collection and correlation")
    print("   • Multi-agent workflow tracing")
    print("   • Tracing best practices and patterns")
    print("   • ❌ Error handling in traced operations")
    
    print("\n Key Benefits of Tracing:")
    print("   • Enhanced debugging and troubleshooting")
    print("   • Performance monitoring and optimization")
    print("   • Better understanding of agent behavior")
    print("   • User experience insights through feedback")
    print("   • Compliance and audit capabilities")
    
    print("\n Next Steps:")
    print("   • Implement tracing in your production agents")
    print("   • Set up Application Insights monitoring")
    print("   • Create custom dashboards and alerts")
    print("   • Establish tracing standards for your team")
    print("   • Explore advanced OpenTelemetry features")
    
    print("\n Additional Resources:")
    print("   • Azure AI Foundry Documentation")
    print("   • OpenTelemetry Python SDK Documentation")
    print("   • Azure Monitor OpenTelemetry Guide")
    print("   • Semantic Conventions for AI/ML")

# Execute cleanup and summary
display_summary()

print("\n Tracing agents tutorial completed ✅ successfully!")
print("Happy tracing! ‍")