# Semantic Kernel OpenTelemetry full-instrumentation example

## Context

Given the incresing complexity of GenAI applications, it is important to have a way to monitor and trace the interactions between different components of the system. OpenTelemetry is a set of APIs, libraries, agents, and instrumentation that provide observability for applications. In this example, we will demonstrate how to use OpenTelemetry with Semantic Kernel to trace the execution of a simple AI Agent, **running the full stack locally using Docker and Aspire dashboard.**

Please read the main documentation about [OpenTelemetry and Semantic Kernel](https://review.learn.microsoft.com/en-us/semantic-kernel/concepts/enterprise-readiness/observability/telemetry-with-app-insights?branch=main&tabs=Powershell&pivots=programming-language-python) for additional information.

## Prerequisites
- Docker installed
- Python 3.12 or later

## Install the required packages
```bash
pip install -r requirements.txt
```

## Run the Aspire dashboard (Docker container)
```bash
docker run --rm -it \
-p 18888:18888 \
-p 4317:18889 \
--name aspire-dashboard \
mcr.microsoft.com/dotnet/aspire-dashboard:9.0
```

## Check environment variables

Ensure you have a `.env` file in this folder, copied from `.env.example`

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from dotenv import load_dotenv
load_dotenv(override=True)

In [None]:
# NOTE logging must be configured before importing any other modules
# to ensure that all loggers are configured correctly
import logging
import sys
logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)

## Key technical note

By default, Semantic Kernel outputs two different types of telemetry data: `traces` and `logs`. Agent invocations are _traced_, while chat completions are _logged_ - since their size may exceed limitations/support of collectors. 

Most OpenTelemetry backends (like Jaeger, Zipkin or MLFlow) do not capture logs, while Aspire dashboard does but does not visualize them in the same way as traces, which can be confusing for developers trying to understand and debug the flow of their application locally.

Fortunately, we can configure Semantic Kernel to output traces instead of logs. This is done by the custom `LogToSpanExporter` class, which is a custom OpenTelemetry exporter that converts logs to spans. This allows us to send all telemetry data to Jaeger for visualization and analysis.

**Note**: this solution is not recommended for production use, as it may lead to performance issues and increased complexity. It is only intended for local development and testing purposes.

In [None]:
from opentelemetry import trace
from opentelemetry.sdk._logs import LogData
from opentelemetry.sdk._logs.export import LogExporter, LogExportResult
import logging

logger = logging.getLogger(__name__)

class LogToSpanExporter(LogExporter):
    def __init__(self, tracer=None):
        self._tracer = tracer or trace.get_tracer(__name__)
        logger.info(f"Using tracer: {self._tracer}")

    def export(self, batch: list[LogData]) -> "LogExportResult":
        """
        Called by the LogPipeline when a batch of logs is ready.
        We convert each LogRecord into its own span.
        """
        logger.info(f"Exporting {len(batch)} log records as spans")
        for data in batch:
            try:
                if not isinstance(data, LogData):
                    logger.warning(f"Skipping non-log record: {data}")
                    continue
                # Create a span for each log record
                span_name = data.log_record.attributes.get("code.function", "log")
                with self._tracer.start_as_current_span(
                    span_name,
                    kind=trace.SpanKind.INTERNAL,
                    attributes={
                        "log.severity": data.log_record.severity_text,
                        "log.message": data.log_record.body,
                        **{f"log.attr.{k}": v for k, v in (data.log_record.attributes or {}).items()},
                    },
                    start_time=data.log_record.timestamp,    # uses the log’s timestamp
                ) as span:
                    logger.info(f"Exporting log record as span: {span_name}")
                    # Optionally, mark error spans
                    if data.log_record.severity_number.value >= 400:
                        span.set_status(trace.Status(trace.StatusCode.ERROR))
            except Exception as e:
                logger.exception(f"Failed to export log record as span: {e}")
                return LogExportResult.FAILURE
                        
            return LogExportResult.SUCCESS

    def shutdown(self):
        # Clean up if needed
        return



## Setting up tracing (spans)

First step is to set up the OpenTelemetry SDK to export traces.

In [None]:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry import trace
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# 1. Set up the provider and exporter (OTLP exporter uses env vars for config)
tracer_provider = TracerProvider()
otlp_exporter = OTLPSpanExporter()
processor = SimpleSpanProcessor(otlp_exporter)

tracer_provider.add_span_processor(processor)

# 2. Register the provider as global
trace.set_tracer_provider(tracer_provider)

## Setup logging

This is where we inject the custom `LogToSpanExporter` class to convert logs to spans. This allows us to send all telemetry data to Jaeger for visualization and analysis.

**NOTE** you can run a comparision with the default `OTLPLogExporter` class, which will output the logs in the Aspire dashboard but not in the timeline view. 

**NOTE #2** when you change the `OTLPLogExporter` to `LogToSpanExporter`, you will need to restart the Jupyter kernel to see the changes. 

In [None]:
import logging
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs.export import ConsoleLogExporter

logger_provider = LoggerProvider()

# 📢 Custom log exporter that sends logs to the OTLP exporter as spans
log_exporter = LogToSpanExporter()

# 📢 Or use the OTLPLogExporter directly if you want to send logs in OTLP format
# log_exporter = OTLPLogExporter()

logger_provider.add_log_record_processor(SimpleLogRecordProcessor(log_exporter))
# Optionally, add a console exporter for local debugging
logger_provider.add_log_record_processor(SimpleLogRecordProcessor(ConsoleLogExporter()))
# Sets the global default logger provider
set_logger_provider(logger_provider)

# Create a logging handler to write logging records, in OTLP format, to the exporter.
handler = LoggingHandler()
# Add filters to the handler to only process records from semantic_kernel.
handler.addFilter(logging.Filter("semantic_kernel"))
# Attach the handler to the root logger. `getLogger()` with no arguments returns the root logger.
# Events from all child loggers will be processed by this handler.
logger = logging.getLogger()
logger.addHandler(handler)


## Setup metrics (optional)

In [None]:
# Initialize a metric provider for the application. This is a factory for creating meters.
from opentelemetry.metrics import set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.view import DropAggregation, View
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
# OTEL metrics exporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

exporter = OTLPMetricExporter()

meter_provider = MeterProvider(
    metric_readers=[PeriodicExportingMetricReader(exporter, export_interval_millis=5000)],
    views=[
        # Dropping all instrument names except for those starting with "semantic_kernel"
        View(instrument_name="*", aggregation=DropAggregation()),
        View(instrument_name="semantic_kernel*"),
    ],
)
# Sets the global default meter provider
set_meter_provider(meter_provider)

In [None]:
# Check SK environment variables are set, otherwise raise an error since SK won't write to OTLP
import os
assert os.getenv("SEMANTICKERNEL_EXPERIMENTAL_GENAI_ENABLE_OTEL_DIAGNOSTICS_SENSITIVE") == "true"

In [None]:
import os
from openai import AsyncAzureOpenAI
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(
    credential, "https://cognitiveservices.azure.com/.default"
)


client = AsyncAzureOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"),
        azure_ad_token_provider=token_provider,
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    )
service = AzureChatCompletion(
        deployment_name=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"),
        async_client=client,
    )

In [None]:
from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.agents.chat_completion.chat_completion_agent import ChatHistoryAgentThread

thread = ChatHistoryAgentThread()

agent = ChatCompletionAgent(
    id="chat-completion-agent",
    name="ChatCompletionAgent",
    service=service,
    instructions="""You are a helpful assistant. You tell jokes""",
)

In [None]:
async for r in agent.invoke(messages="A joke about developers", thread=thread):
    print(r)

## Results

## 1. Logs rendered as spans
![Logs rendered as traces](./logs_as_traces.png)

## 2. Default logs
![Default logs](./default_logs.png)

![Logs table](./logs_table.png)