# Tracing AI Applications Using OpenAI SDK

Welcome to this comprehensive guide on implementing tracing for AI applications using OpenAI SDK with OpenTelemetry in Azure AI Foundry. This notebook demonstrates how to trace AI applications to gain deep visibility into execution steps, diagnose issues, and enhance performance.

## What You'll Learn

1. **Prerequisites Setup** - Configure Azure AI Foundry project and Application Insights
2. **Enable Project Tracing** - Connect Azure Application Insights to your AI Foundry resource
3. **OpenAI SDK Instrumentation** - Automatic tracing of OpenAI API calls
4. **Azure AI Projects Integration** - Using project clients for streamlined authentication
5. **Custom Spans and Attributes** - Adding business logic tracing
6. **Console Tracing** - Local debugging with console output
7. **Advanced Tracing Patterns** - Complex workflows and error handling

## Prerequisites

- An Azure AI Foundry project created
- An AI application that uses OpenAI SDK
- Environment variables configured in `.env` file
- Azure Application Insights resource (will be configured)

## Learning Objectives

By the end of this notebook, you will understand how to:
- Configure OpenTelemetry for Azure AI applications
- Trace OpenAI API calls automatically
- Create custom spans for business logic
- Send traces to Azure Application Insights
- Debug applications using trace data
- Implement production-ready monitoring

## 1. Environment Setup and Prerequisites Check

Let's start by setting up our environment and checking that all prerequisites are met.

In [None]:
import os
import time
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

print("🔧 Azure AI Foundry Tracing Prerequisites Check:")
print("=" * 60)

# Check required environment variables
required_vars = {
    'AZURE_AI_PROJECT_ENDPOINT': 'Azure AI Foundry project endpoint',
    'AZURE_OPENAI_DEPLOYMENT_NAME': 'OpenAI model deployment name'
}

optional_vars = {
    'APPLICATION_INSIGHTS_CONNECTION_STRING': 'Azure Application Insights connection string'
}

print("📋 Required Environment Variables:")
all_required_set = True
for var, description in required_vars.items():
    value = os.getenv(var)
    if value:
        # Show only partial values for security
        display_value = f"{value[:50]}..." if len(value) > 50 else value
        print(f"  ✅ {var}: {display_value}")
    else:
        print(f"  ❌ {var}: Not set")
        print(f"     📝 {description}")
        all_required_set = False

print("\n📋 Optional Environment Variables:")
for var, description in optional_vars.items():
    value = os.getenv(var)
    if value:
        print(f"  ✅ {var}: Connection string configured")
    else:
        print(f"  ⚠️  {var}: Not set (will be configured from project)")
        print(f"     📝 {description}")

print("-" * 60)
if all_required_set:
    print("🚀 Prerequisites check passed! Ready to proceed.")
else:
    print("⚠️  Please configure missing environment variables before proceeding.")
    print("💡 Check your .env file or Azure AI Foundry project settings.")

## 2. Install Required Packages

Install the necessary packages for tracing with OpenAI SDK and Azure Monitor.

In [None]:
# Check if required packages are available
try:
    # Azure AI Projects SDK
    from azure.ai.projects import AIProjectClient
    from azure.identity import DefaultAzureCredential
    
    # Azure Monitor OpenTelemetry
    from azure.monitor.opentelemetry import configure_azure_monitor
    
    # OpenTelemetry instrumentation
    from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
    
    print("✅ All required packages are available!")
    print("-" * 40)
    print("📦 Packages loaded:")
    print("  • azure-ai-projects")
    print("  • azure-monitor-opentelemetry") 
    print("  • opentelemetry-instrumentation-openai-v2")
    print("  • opentelemetry-sdk")
    
except ImportError as e:
    print(f"❌ Missing package: {e}")
    print("\n💡 Install missing packages with:")
    print("pip install azure-ai-projects azure-monitor-opentelemetry opentelemetry-instrumentation-openai-v2")
    raise

## 3. Configure Azure AI Foundry Project Client

Connect to your Azure AI Foundry project using the Azure AI Projects client.

In [None]:
def setup_ai_project_client():
    """
    Set up the Azure AI Project client with authentication.
    """
    print("🔗 Setting up Azure AI Project Client...")
    
    # Get project endpoint from environment variables
    project_endpoint = os.getenv('AZURE_AI_PROJECT_ENDPOINT')
    
    if not project_endpoint:
        raise ValueError("AZURE_AI_PROJECT_ENDPOINT not found in environment variables")
    
    # Create project client with default Azure credentials
    try:
        project_client = AIProjectClient(
            credential=DefaultAzureCredential(),
            endpoint=project_endpoint
        )
        
        print("✅ Azure AI Project client created successfully!")
        print(f"📍 Project Endpoint: {project_endpoint}")
        
        return project_client
        
    except Exception as e:
        print(f"❌ Failed to create project client: {e}")
        print("💡 Troubleshooting tips:")
        print("  • Ensure you're logged in with 'az login'")
        print("  • Verify your Azure permissions for the AI project")
        print("  • Check the project endpoint URL format")
        raise

# Initialize the project client
project_client = setup_ai_project_client()

## 4. Get Application Insights Connection String

Retrieve the Application Insights connection string from your Azure AI Foundry project.

In [None]:
def get_application_insights_connection():
    """
    Get the Application Insights connection string from the AI project.
    """
    print("📊 Retrieving Application Insights Connection...")
    
    try:
        # Get the connection string from the project
        connection_string = project_client.telemetry.get_application_insights_connection_string()
        
        if connection_string:
            print("✅ Application Insights connection string retrieved!")
            print(f"🔗 Connection configured: {connection_string[:30]}...")
            return connection_string
        else:
            print("❌ No Application Insights connection string found")
            print("💡 Enable Application Insights in your AI Foundry project:")
            print("  1. Go to Azure AI Foundry portal")
            print("  2. Navigate to your project")
            print("  3. Select 'Tracing' in the sidebar")
            print("  4. Configure Application Insights resource")
            return None
            
    except Exception as e:
        print(f"❌ Failed to get Application Insights connection: {e}")
        print("💡 This might be expected if Application Insights is not configured")
        return None

# Get the connection string
connection_string = get_application_insights_connection()

## 5. Instrument OpenAI SDK

Set up OpenTelemetry instrumentation for the OpenAI SDK to automatically trace API calls.

In [None]:
def instrument_openai_sdk():
    """
    Instrument the OpenAI SDK for automatic tracing.
    """
    print("🔧 Instrumenting OpenAI SDK...")
    
    # Instrument the OpenAI SDK
    OpenAIInstrumentor().instrument()
    
    # Configure content capture (be careful with sensitive data)
    content_capture = os.getenv('OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT', 'false')
    if content_capture.lower() == 'true':
        print("⚠️  Content capture ENABLED - prompts and responses will be traced")
        print("   (Only enable this in development environments)")
    else:
        print("🔒 Content capture DISABLED - only metadata will be traced")
        print("   (Recommended for production environments)")
    
    print("✅ OpenAI SDK instrumentation complete!")
    print("📈 All OpenAI API calls will now be automatically traced")

# Set up OpenAI instrumentation
instrument_openai_sdk()

## 6. Configure Azure Monitor (Application Insights)

Configure OpenTelemetry to send traces to Azure Application Insights.

In [None]:
def configure_azure_monitor_tracing(connection_string):
    """
    Configure Azure Monitor to send traces to Application Insights.
    """
    if not connection_string:
        print("⚠️  Skipping Azure Monitor configuration - no connection string available")
        print("   Traces will only be visible in console output")
        return False
    
    print("☁️ Configuring Azure Monitor for tracing...")
    
    try:
        # Configure Azure Monitor with the connection string
        configure_azure_monitor(connection_string=connection_string)
        
        print("✅ Azure Monitor configured successfully!")
        print("📊 Traces will be sent to Application Insights")
        print("🔍 View traces at: Azure AI Foundry → Your Project → Tracing")
        return True
        
    except Exception as e:
        print(f"❌ Failed to configure Azure Monitor: {e}")
        print("💡 Continuing with console tracing only")
        return False

# Configure Azure Monitor if connection string is available
azure_monitor_enabled = configure_azure_monitor_tracing(connection_string)

## 7. Set Up Console Tracing

Configure console tracing for local debugging and development.

In [None]:
def setup_console_tracing():
    """
    Set up console tracing for local debugging.
    """
    print("🖥️ Setting up console tracing...")
    
    # Set up trace provider if not already configured by Azure Monitor
    if not azure_monitor_enabled:
        trace_provider = TracerProvider()
        trace.set_tracer_provider(trace_provider)
        
        # Add console exporter for local viewing
        console_exporter = ConsoleSpanExporter()
        span_processor = SimpleSpanProcessor(console_exporter)
        trace.get_tracer_provider().add_span_processor(span_processor)
        
        print("✅ Console tracing configured")
        print("📺 Traces will be displayed in console output")
    else:
        print("ℹ️  Console tracing not needed - Azure Monitor already configured")
    
    # Get tracer for creating custom spans
    tracer = trace.get_tracer(__name__)
    print("🔍 Tracer ready for custom spans")
    
    return tracer

# Set up console tracing
tracer = setup_console_tracing()

## 8. Create OpenAI Client

Get the OpenAI client from the Azure AI project for making traced API calls.

In [None]:
def setup_openai_client():
    """
    Set up the OpenAI client through the Azure AI project.
    """
    print("🤖 Setting up OpenAI client...")
    
    try:
        # Get the OpenAI client from the project
        client = project_client.get_openai_client()
        
        # Get the deployment name
        deployment_name = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
        
        if not deployment_name:
            raise ValueError("AZURE_OPENAI_DEPLOYMENT_NAME not found in environment variables")
        
        print("✅ OpenAI client configured successfully!")
        print(f"🎯 Using deployment: {deployment_name}")
        print("📡 Client is instrumented and ready for traced API calls")
        
        return client, deployment_name
        
    except Exception as e:
        print(f"❌ Failed to setup OpenAI client: {e}")
        print("💡 Troubleshooting:")
        print("  • Check your Azure AI project has OpenAI connections")
        print("  • Verify the deployment name in your environment variables")
        print("  • Ensure proper Azure permissions")
        raise

# Setup the OpenAI client
client, deployment_name = setup_openai_client()

## 9. Basic Traced API Call

Make a simple API call to test the tracing setup.

In [None]:
def make_basic_traced_call():
    """
    Make a basic OpenAI API call with tracing enabled.
    """
    print("🚀 Making basic traced API call...")
    print("-" * 40)
    
    # The OpenAI call will be automatically traced by the instrumentation
    try:
        response = client.chat.completions.create(
            model=deployment_name,
            messages=[
                {"role": "user", "content": "Write a short poem about OpenTelemetry."}
            ],
            max_tokens=100,
            temperature=0.7
        )
        
        print("✅ API call successful!")
        print(f"📝 Response: {response.choices[0].message.content}")
        print(f"📊 Token usage: {response.usage.total_tokens} total")
        print(f"   (Prompt: {response.usage.prompt_tokens}, Completion: {response.usage.completion_tokens})")
        
        if azure_monitor_enabled:
            print("\n🔍 Trace should now be visible in:")
            print("   • Azure AI Foundry → Your Project → Tracing")
            print("   • Application Insights → Transaction Search")
        else:
            print("\n📺 Check console output above for trace data")
        
        return response
        
    except Exception as e:
        print(f"❌ API call failed: {e}")
        raise

# Test basic tracing
response = make_basic_traced_call()

## 10. Custom Spans and Business Logic Tracing

Demonstrate how to add custom spans to trace business logic around AI operations.

In [None]:
def assess_claims_with_context(claims, contexts):
    """
    Example function from the documentation that assesses claims with context.
    This demonstrates custom spans around business logic.
    """
    def build_prompt_with_context(claim: str, context: str) -> list:
        return [
            {
                'role': 'system', 
                'content': "I will ask you to assess whether a particular scientific claim is supported by evidence provided. Output only the text 'True' if the claim is true, 'False' if the claim is false, or 'NEE' if there's not enough evidence."
            },
            {
                'role': 'user', 
                'content': f"""
                    The evidence is the following: {context}

                    Assess the following claim on the basis of the evidence. Output only the text 'True' if the claim is true, 'False' if the claim is false, or 'NEE' if there's not enough evidence. Do not output any other text.

                    Claim:
                    {claim}

                    Assessment:
                """
            }
        ]
    
    # Create a custom span for the entire operation
    with tracer.start_as_current_span("assess_claims_with_context") as operation_span:
        responses = []
        current_span = trace.get_current_span()
        
        # Add operation metadata
        current_span.set_attribute("operation.claims_count", len(claims))
        current_span.set_attribute("operation.type", "claim_assessment")
        
        print(f"🔍 Assessing {len(claims)} claims with custom tracing...")
        
        for i, (claim, context) in enumerate(zip(claims, contexts)):
            # Create a span for each individual claim assessment
            with tracer.start_as_current_span(f"assess_claim_{i+1}") as claim_span:
                claim_span.set_attribute("claim.index", i + 1)
                claim_span.set_attribute("claim.length", len(claim))
                claim_span.set_attribute("context.length", len(context))
                
                try:
                    # Make the API call (automatically traced)
                    response = client.chat.completions.create(
                        model=deployment_name,
                        messages=build_prompt_with_context(claim=claim, context=context),
                        max_tokens=10,
                        temperature=0.1
                    )
                    
                    assessment = response.choices[0].message.content.strip('., ')
                    responses.append(assessment)
                    
                    # Add result to span
                    claim_span.set_attribute("assessment.result", assessment)
                    claim_span.set_attribute("tokens.used", response.usage.total_tokens)
                    claim_span.add_event(f"Claim {i+1} assessed successfully")
                    
                    print(f"  Claim {i+1}: {assessment}")
                    
                except Exception as e:
                    claim_span.record_exception(e)
                    claim_span.set_attribute("assessment.error", True)
                    print(f"  ❌ Claim {i+1} failed: {e}")
                    responses.append("ERROR")
        
        operation_span.add_event("All claims processed")
        operation_span.set_attribute("operation.success_rate", 
                                   len([r for r in responses if r != "ERROR"]) / len(responses))
        
        return responses

# Example usage with custom spans
def demo_custom_spans():
    """
    Demonstrate custom spans with a claims assessment example.
    """
    print("🧪 Testing Custom Spans with Claims Assessment:")
    print("=" * 55)
    
    # Sample claims and contexts
    claims = [
        "The sky is blue during the day",
        "Water freezes at 100 degrees Celsius",
        "Artificial intelligence can process natural language"
    ]
    
    contexts = [
        "Scientific observations show that the sky appears blue due to Rayleigh scattering of sunlight by molecules in Earth's atmosphere.",
        "Water freezes at 0 degrees Celsius (32 degrees Fahrenheit) at standard atmospheric pressure.",
        "Modern AI systems like GPT models demonstrate advanced natural language processing capabilities through transformer architectures."
    ]
    
    # Run the assessment with custom tracing
    results = assess_claims_with_context(claims, contexts)
    
    print(f"\n📊 Assessment Results:")
    for i, (claim, result) in enumerate(zip(claims, results)):
        print(f"  {i+1}. {claim[:50]}... → {result}")
    
    print(f"\n🔍 Custom spans created:")
    print("  • 'assess_claims_with_context' - Main operation span")
    print("  • 'assess_claim_N' - Individual claim assessment spans")
    print("  • Automatic OpenAI API spans within each claim span")

# Run the demo
demo_custom_spans()

## 11. Advanced Tracing with Attributes and Events

Demonstrate advanced tracing features including custom attributes and events.

In [None]:
def advanced_tracing_demo():
    """
    Demonstrate advanced tracing features with attributes and events.
    """
    print("🚀 Advanced Tracing Features Demo:")
    print("=" * 40)
    
    with tracer.start_as_current_span("advanced_ai_workflow") as workflow_span:
        # Set workflow-level attributes
        workflow_span.set_attribute("workflow.version", "1.0")
        workflow_span.set_attribute("workflow.type", "content_generation")
        workflow_span.set_attribute("user.session_id", "demo_session_123")
        
        # Add an event to mark workflow start
        workflow_span.add_event("Workflow started", {
            "timestamp": time.time(),
            "environment": "development"
        })
        
        # Step 1: Input processing
        with tracer.start_as_current_span("process_input") as input_span:
            input_text = "Explain quantum computing in simple terms"
            
            input_span.set_attribute("input.text_length", len(input_text))
            input_span.set_attribute("input.language", "english")
            input_span.set_attribute("processing.step", 1)
            
            # Simulate processing time
            time.sleep(0.1)
            input_span.add_event("Input processing complete")
        
        # Step 2: AI generation with multiple attempts
        for attempt in range(2):
            with tracer.start_as_current_span(f"ai_generation_attempt_{attempt + 1}") as gen_span:
                gen_span.set_attribute("generation.attempt", attempt + 1)
                gen_span.set_attribute("generation.max_tokens", 150)
                gen_span.set_attribute("generation.temperature", 0.7 + (attempt * 0.1))
                
                try:
                    response = client.chat.completions.create(
                        model=deployment_name,
                        messages=[
                            {
                                "role": "system", 
                                "content": "You are an expert at explaining complex topics in simple terms."
                            },
                            {"role": "user", "content": input_text}
                        ],
                        max_tokens=150,
                        temperature=0.7 + (attempt * 0.1)
                    )
                    
                    # Add detailed response attributes
                    gen_span.set_attribute("response.length", len(response.choices[0].message.content))
                    gen_span.set_attribute("response.finish_reason", response.choices[0].finish_reason)
                    gen_span.set_attribute("tokens.prompt", response.usage.prompt_tokens)
                    gen_span.set_attribute("tokens.completion", response.usage.completion_tokens)
                    gen_span.set_attribute("tokens.total", response.usage.total_tokens)
                    gen_span.set_attribute("model.name", response.model)
                    
                    # Add success event
                    gen_span.add_event("Generation successful", {
                        "attempt": attempt + 1,
                        "tokens_used": response.usage.total_tokens
                    })
                    
                    print(f"✅ Attempt {attempt + 1} successful:")
                    print(f"   {response.choices[0].message.content[:100]}...")
                    
                    # Success - break out of retry loop
                    break
                    
                except Exception as e:
                    gen_span.record_exception(e)
                    gen_span.set_attribute("generation.error", True)
                    gen_span.add_event("Generation failed", {
                        "attempt": attempt + 1,
                        "error_type": type(e).__name__
                    })
                    print(f"❌ Attempt {attempt + 1} failed: {e}")
        
        # Step 3: Post-processing
        with tracer.start_as_current_span("post_processing") as post_span:
            post_span.set_attribute("processing.step", 3)
            post_span.set_attribute("processing.type", "formatting")
            
            # Simulate post-processing
            time.sleep(0.05)
            
            # Add metadata about the final result
            if 'response' in locals():
                word_count = len(response.choices[0].message.content.split())
                post_span.set_attribute("output.word_count", word_count)
                post_span.set_attribute("output.has_technical_terms", True)
                
            post_span.add_event("Post-processing complete")
        
        # Mark workflow completion
        workflow_span.add_event("Workflow completed successfully", {
            "total_steps": 3,
            "final_status": "success"
        })
        
        workflow_span.set_attribute("workflow.status", "completed")
        workflow_span.set_attribute("workflow.duration_category", "normal")

# Run advanced tracing demo
advanced_tracing_demo()

## 12. Console Tracing Configuration

Configure tracing to output to console for local debugging, as shown in the documentation.

In [None]:
def setup_console_only_tracing():
    """
    Set up console-only tracing for CI/CD or local development.
    This follows the console tracing example from the documentation.
    """
    print("📺 Setting up Console-Only Tracing...")
    
    # Import additional components for console tracing
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
    
    # Create a separate tracer provider for console output
    span_exporter = ConsoleSpanExporter()
    tracer_provider = TracerProvider()
    tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter))
    
    # Note: Don't set this as the global provider if Azure Monitor is already configured
    # This is just for demonstration
    
    print("✅ Console tracing setup complete")
    print("📋 Use this configuration for:")
    print("  • Unit testing environments")
    print("  • CI/CD pipelines")
    print("  • Local development debugging")
    print("  • When Azure connectivity is not available")
    
    return tracer_provider

def demo_console_tracing():
    """
    Demonstrate console-only tracing with a simple API call.
    """
    print("\n🧪 Console Tracing Demo:")
    print("-" * 30)
    
    # Make a simple API call that will be traced to console
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "user", "content": "Write a short poem on open telemetry."}
        ],
        max_tokens=80,
        temperature=0.8
    )
    
    print(f"📝 Generated poem:\n{response.choices[0].message.content}")
    print(f"\n📊 Usage: {response.usage.total_tokens} tokens")
    print("\n💡 Check the console output above for the JSON trace data")

# Set up console tracing (for demonstration)
console_tracer_provider = setup_console_only_tracing()

# Demo console tracing
demo_console_tracing()

## 13. Error Handling and Exception Tracing

Demonstrate how to properly trace errors and exceptions in AI applications.

In [None]:
def demo_error_tracing():
    """
    Demonstrate error handling and exception tracing.
    """
    print("🚨 Error Handling and Exception Tracing Demo:")
    print("=" * 50)
    
    # Test 1: Invalid model name (should fail)
    with tracer.start_as_current_span("test_invalid_model") as error_span:
        error_span.set_attribute("test.type", "invalid_model_error")
        error_span.set_attribute("test.expected_outcome", "failure")
        
        try:
            print("1️⃣ Testing invalid model name...")
            response = client.chat.completions.create(
                model="non-existent-model",
                messages=[{"role": "user", "content": "Hello"}],
                max_tokens=10
            )
            
        except Exception as e:
            # Properly record the exception in the span
            error_span.record_exception(e)
            error_span.set_attribute("error.type", type(e).__name__)
            error_span.set_attribute("error.message", str(e))
            error_span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
            
            print(f"   ❌ Expected error caught: {type(e).__name__}")
            print(f"   📝 Error message: {str(e)[:100]}...")
    
    # Test 2: Malformed request (should fail)
    with tracer.start_as_current_span("test_malformed_request") as malformed_span:
        malformed_span.set_attribute("test.type", "malformed_request")
        
        try:
            print("\n2️⃣ Testing malformed request...")
            # This should fail due to invalid max_tokens
            response = client.chat.completions.create(
                model=deployment_name,
                messages=[{"role": "user", "content": "Hello"}],
                max_tokens=-1  # Invalid value
            )
            
        except Exception as e:
            malformed_span.record_exception(e)
            malformed_span.set_attribute("error.type", type(e).__name__)
            malformed_span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
            
            print(f"   ❌ Expected error caught: {type(e).__name__}")
    
    # Test 3: Successful request with error recovery
    with tracer.start_as_current_span("test_error_recovery") as recovery_span:
        recovery_span.set_attribute("test.type", "error_recovery")
        recovery_span.set_attribute("retry.max_attempts", 3)
        
        for attempt in range(3):
            with tracer.start_as_current_span(f"retry_attempt_{attempt + 1}") as attempt_span:
                attempt_span.set_attribute("retry.attempt", attempt + 1)
                
                try:
                    print(f"\n3️⃣ Retry attempt {attempt + 1}...")
                    
                    # Simulate different error conditions for first attempts
                    if attempt == 0:
                        # Simulate a temporary failure
                        attempt_span.add_event("Simulating temporary failure")
                        raise Exception("Simulated temporary network error")
                    elif attempt == 1:
                        # Simulate another temporary failure
                        attempt_span.add_event("Simulating another temporary failure")
                        raise Exception("Simulated rate limiting error")
                    else:
                        # Success on third attempt
                        response = client.chat.completions.create(
                            model=deployment_name,
                            messages=[{"role": "user", "content": "Hello! This is a test."}],
                            max_tokens=30
                        )
                        
                        attempt_span.set_attribute("retry.success", True)
                        attempt_span.set_attribute("tokens.used", response.usage.total_tokens)
                        attempt_span.add_event("Request succeeded")
                        
                        recovery_span.set_attribute("recovery.successful_attempt", attempt + 1)
                        recovery_span.add_event("Error recovery successful")
                        
                        print(f"   ✅ Success on attempt {attempt + 1}!")
                        print(f"   📝 Response: {response.choices[0].message.content}")
                        break
                        
                except Exception as e:
                    attempt_span.record_exception(e)
                    attempt_span.set_attribute("retry.failed", True)
                    attempt_span.set_attribute("error.type", type(e).__name__)
                    
                    print(f"   ❌ Attempt {attempt + 1} failed: {e}")
                    
                    if attempt == 2:  # Last attempt
                        recovery_span.set_status(trace.Status(trace.StatusCode.ERROR, "All retry attempts failed"))
                        recovery_span.add_event("Error recovery failed")
    
    print(f"\n✅ Error tracing demo completed!")
    print("🔍 All errors and exceptions have been properly recorded in traces")

# Run error tracing demo
demo_error_tracing()

## 14. Production Monitoring Best Practices

Learn about production-ready monitoring and tracing patterns.

In [None]:
def demonstrate_production_patterns():
    """
    Demonstrate production monitoring patterns and best practices.
    """
    print("🏭 Production Monitoring Best Practices:")
    print("=" * 50)
    
    # Create a production-style AI service simulation
    with tracer.start_as_current_span("ai_service_request") as service_span:
        # Add service-level attributes
        service_span.set_attribute("service.name", "ai-content-generator")
        service_span.set_attribute("service.version", "1.2.3")
        service_span.set_attribute("service.instance", "instance-001")
        service_span.set_attribute("user.id", "user_456")
        service_span.set_attribute("request.id", "req_789")
        service_span.set_attribute("request.type", "content_generation")
        
        # Step 1: Input validation
        with tracer.start_as_current_span("input_validation") as validation_span:
            validation_span.set_attribute("validation.required_fields", ["content_type", "prompt"])
            validation_span.set_attribute("validation.passed", True)
            validation_span.add_event("Input validation completed")
        
        # Step 2: Rate limiting check
        with tracer.start_as_current_span("rate_limit_check") as rate_span:
            rate_span.set_attribute("rate_limit.user_id", "user_456")
            rate_span.set_attribute("rate_limit.current_usage", 45)
            rate_span.set_attribute("rate_limit.limit", 100)
            rate_span.set_attribute("rate_limit.window", "hourly")
            rate_span.set_attribute("rate_limit.allowed", True)
        
        # Step 3: Content generation
        with tracer.start_as_current_span("content_generation") as content_span:
            content_span.set_attribute("generation.model", deployment_name)
            content_span.set_attribute("generation.max_tokens", 200)
            content_span.set_attribute("generation.temperature", 0.7)
            
            start_time = time.time()
            
            try:
                response = client.chat.completions.create(
                    model=deployment_name,
                    messages=[
                        {
                            "role": "system",
                            "content": "You are a professional content writer. Create engaging, high-quality content."
                        },
                        {
                            "role": "user",
                            "content": "Write a brief introduction for a blog post about sustainable technology innovations."
                        }
                    ],
                    max_tokens=200,
                    temperature=0.7
                )
                
                generation_time = time.time() - start_time
                
                # Add comprehensive performance metrics
                content_span.set_attribute("generation.duration_ms", round(generation_time * 1000, 2))
                content_span.set_attribute("generation.tokens_per_second", 
                                         round(response.usage.total_tokens / generation_time, 2))
                content_span.set_attribute("generation.prompt_tokens", response.usage.prompt_tokens)
                content_span.set_attribute("generation.completion_tokens", response.usage.completion_tokens)
                content_span.set_attribute("generation.total_tokens", response.usage.total_tokens)
                
                # Business metrics
                content_span.set_attribute("business.content_length", len(response.choices[0].message.content))
                content_span.set_attribute("business.estimated_cost_usd", 
                                         round(response.usage.total_tokens * 0.0001, 6))  # Rough estimate
                
                print(f"✅ Content generated successfully!")
                print(f"   Duration: {generation_time:.2f}s")
                print(f"   Tokens: {response.usage.total_tokens}")
                print(f"   Content: {response.choices[0].message.content[:100]}...")
                
            except Exception as e:
                content_span.record_exception(e)
                content_span.set_attribute("generation.error", True)
                raise
        
        # Step 4: Content quality check
        with tracer.start_as_current_span("quality_check") as quality_span:
            content_text = response.choices[0].message.content
            
            # Simulate quality metrics
            word_count = len(content_text.split())
            char_count = len(content_text)
            has_proper_structure = len(content_text.split('.')) > 2
            
            quality_span.set_attribute("quality.word_count", word_count)
            quality_span.set_attribute("quality.character_count", char_count)
            quality_span.set_attribute("quality.has_proper_structure", has_proper_structure)
            quality_span.set_attribute("quality.readability_score", 85)  # Simulated
            quality_span.set_attribute("quality.passed", True)
        
        # Add final service metrics
        service_span.set_attribute("service.request_status", "completed")
        service_span.set_attribute("service.total_processing_time_ms", 
                                 round((time.time() - start_time) * 1000, 2))
        service_span.add_event("Service request completed successfully")
    
    print(f"\n📊 Production Monitoring Key Metrics:")
    print("  • Request duration and throughput")
    print("  • Token usage and estimated costs")
    print("  • Error rates and types")
    print("  • Business metrics (content quality, user satisfaction)")
    print("  • Infrastructure metrics (rate limits, quotas)")
    
    print(f"\n🚨 Recommended Alerts:")
    print("  • Response time > 95th percentile (e.g., > 5 seconds)")
    print("  • Error rate > 5% over 5-minute window")
    print("  • Token usage approaching monthly quota")
    print("  • Quality score dropping below threshold")

# Run production patterns demo
demonstrate_production_patterns()

## 15. Viewing Traces in Azure AI Foundry

Instructions for viewing and analyzing your traces in Azure AI Foundry portal.

In [None]:
def show_trace_viewing_instructions():
    """
    Provide instructions for viewing traces in Azure AI Foundry and Application Insights.
    """
    print("🔍 Viewing Your Traces in Azure AI Foundry:")
    print("=" * 55)
    
    if azure_monitor_enabled:
        print("✅ Your traces are being sent to Azure Application Insights!")
        print()
        
        print("📋 Step-by-Step Instructions:")
        steps = [
            "1. 🌐 Go to Azure AI Foundry portal (ai.azure.com)",
            "2. 🔍 Navigate to your project",
            "3. 📊 Click 'Tracing' in the left sidebar",
            "4. 🎯 You should see traces from this notebook session",
            "5. 🔎 Click on any trace to see detailed span information",
            "6. 📈 Explore span attributes, events, and timing data"
        ]
        
        for step in steps:
            print(f"   {step}")
        
        print(f"\n🔍 What to Look for in Your Traces:")
        trace_elements = [
            "• 'assess_claims_with_context' - Custom business logic spans",
            "• 'chat completions' - Automatic OpenAI API call spans",
            "• 'advanced_ai_workflow' - Multi-step process spans",
            "• Error spans with exception details",
            "• Performance metrics and token usage data"
        ]
        
        for element in trace_elements:
            print(f"  {element}")
        
        print(f"\n⚡ Pro Tips for Trace Analysis:")
        tips = [
            "• Use the timeline view to identify performance bottlenecks",
            "• Filter traces by custom attributes (e.g., user.session_id)",
            "• Look for patterns in error traces to identify common issues",
            "• Monitor token usage trends across different operations",
            "• Set up alerts on high-latency or high-error operations"
        ]
        
        for tip in tips:
            print(f"  {tip}")
            
        print(f"\n📊 Alternative: Direct Application Insights Access:")
        ai_steps = [
            "1. Go to Azure Portal (portal.azure.com)",
            "2. Find your Application Insights resource",
            "3. Navigate to 'Investigate' → 'Transaction search'",
            "4. Use filters to find specific traces",
            "5. Try KQL queries in the 'Logs' section"
        ]
        
        for step in ai_steps:
            print(f"   {step}")
            
        print(f"\n🔧 Sample KQL Queries to Try:")
        queries = [
            "// Find all AI workflow traces",
            "traces | where operation_Name contains 'ai_workflow'",
            "",
            "// Analyze token usage",
            "traces | where customDimensions has 'tokens.total'",
            "| extend tokens = toint(customDimensions['tokens.total'])",
            "| summarize avg(tokens), max(tokens) by bin(timestamp, 1h)",
            "",
            "// Find error traces",
            "traces | where severityLevel >= 3",
            "| order by timestamp desc"
        ]
        
        for query in queries:
            print(f"   {query}")
    else:
        print("❌ Azure Monitor not configured")
        print("💡 To view traces in Azure AI Foundry:")
        print("  1. Configure Application Insights in your AI Foundry project")
        print("  2. Run this notebook again to send traces to Azure")
        print("  3. For now, traces are only visible in console output")

show_trace_viewing_instructions()

## 16. Summary and Next Steps

Congratulations! You've successfully implemented comprehensive tracing for AI applications using OpenTelemetry and Azure AI Foundry.

In [None]:
def workshop_summary():
    """
    Summarize the workshop achievements and provide next steps.
    """
    print("🎯 Workshop Summary: Tracing AI Applications")
    print("=" * 60)
    
    print("✅ What You've Accomplished:")
    achievements = [
        "• Set up Azure AI Foundry project integration",
        "• Configured OpenTelemetry instrumentation for OpenAI SDK",
        "• Established Azure Application Insights connectivity",
        "• Created custom spans for business logic tracing",
        "• Implemented comprehensive error handling and exception tracing",
        "• Added detailed attributes and events for observability",
        "• Demonstrated production monitoring patterns",
        "• Set up both cloud and console tracing options"
    ]
    
    for achievement in achievements:
        print(f"  {achievement}")
    
    print(f"\n🛠️ Technical Skills Gained:")
    skills = [
        "• OpenTelemetry SDK configuration and usage",
        "• Azure AI Projects client integration",
        "• Custom span creation with attributes and events",
        "• Exception recording and error status handling",
        "• Performance monitoring and metrics collection",
        "• Production-ready observability patterns"
    ]
    
    for skill in skills:
        print(f"  {skill}")
    
    print(f"\n🚀 Next Steps for Your AI Applications:")
    next_steps = [
        "1. 📊 Implement monitoring dashboards in Application Insights",
        "2. 🚨 Set up alerts for error rates and performance thresholds",
        "3. 📈 Add business-specific metrics to track AI effectiveness",
        "4. 🔍 Use trace data to optimize AI application performance",
        "5. 🏭 Deploy with production monitoring best practices",
        "6. 📚 Explore advanced OpenTelemetry features (sampling, batching)",
        "7. 🔗 Integrate with other Azure monitoring services"
    ]
    
    for step in next_steps:
        print(f"  {step}")
    
    print(f"\n💡 Key Takeaways:")
    takeaways = [
        "• Observability is crucial for production AI applications",
        "• OpenTelemetry provides standardized, vendor-neutral tracing",
        "• Azure AI Foundry simplifies AI application monitoring setup",
        "• Custom spans enable business-specific observability",
        "• Proper error tracing helps diagnose and resolve issues quickly",
        "• Performance data guides optimization efforts"
    ]
    
    for takeaway in takeaways:
        print(f"  {takeaway}")
    
    print(f"\n🎉 Congratulations on completing this comprehensive tracing workshop!")
    print("You're now equipped to build observable, production-ready AI applications!")

# Clean up resources
def cleanup_resources():
    """
    Clean up OpenTelemetry instrumentation.
    """
    try:
        # Uninstrument OpenAI SDK
        OpenAIInstrumentor().uninstrument()
        print("\n🧹 OpenAI SDK instrumentation cleaned up")
    except Exception as e:
        print(f"\nNote: Cleanup not needed or failed: {e}")

# Display summary and cleanup
workshop_summary()
cleanup_resources()

## 📚 Additional Resources and References

### Official Documentation
- [Azure AI Foundry Tracing Documentation](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/trace-application)
- [OpenTelemetry Python Documentation](https://opentelemetry.io/docs/languages/python/)
- [Azure Monitor OpenTelemetry Documentation](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable)

### Code Examples and Templates
- [Azure AI Projects SDK Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects)
- [OpenTelemetry Instrumentation Examples](https://github.com/open-telemetry/opentelemetry-python-contrib)

### Best Practices Guides
- [Production Monitoring for AI Applications](https://learn.microsoft.com/azure/architecture/ai-ml/guide/monitoring-ai-applications)
- [OpenTelemetry Best Practices](https://opentelemetry.io/docs/specs/otel/performance/)

### Troubleshooting
Common issues and solutions:

1. **No traces appearing**: Check Application Insights configuration and connection string
2. **High performance overhead**: Implement sampling and batch processors
3. **Missing span data**: Verify custom attributes don't contain special characters
4. **Authentication errors**: Ensure Azure credentials are properly configured

### Environment Variables Reference
Required variables for this notebook:
```bash
# Azure AI Foundry Project
AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com/api/projects/your-project

# OpenAI Deployment
AZURE_OPENAI_DEPLOYMENT_NAME=your-model-deployment-name

# Optional: Content capture (development only)
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false

# Optional: Direct Application Insights (if not using project client)
APPLICATION_INSIGHTS_CONNECTION_STRING=InstrumentationKey=...
```

This completes the comprehensive guide to tracing AI applications using OpenAI SDK with OpenTelemetry and Azure AI Foundry!